Counting datasources in a GAMSearch result tree
I saw this in the Studio GUI but I think we had it in a long time. I'm not sure if it's from CS or the CL.
Let's say we have N text documents and some entities present in them. We load with PER_GRAPH for entities.
We ask for connections between entity A and entity B.
Entity A may have been created as part of doc1.txt and then that is the only node for Entity A. Similarly, entity B may have been created as part of doc2.txt.
If A and B co-occur in doc3.txt, this is reported as a "result from 3 datasets" even though it is not really. We count 3 datasets {doc1, doc2, doc3}. But in fact one dataset is sufficient to find this result.
The simple fix would be to count the number of edge datasets instead of the number of node datasets. This would work I think in all cases, regardless of how many entities we create, etc.
What do you think @mmohanty ? For instance, on these three texts, if you look for "Engelmann type:Location", one answer with Engelmann France is reported as being "from 3 data sources"