# Download
This repository includes:
1. The `core` folder which provides:
- the jar file `connection-lens-core-full-1.1-SNAPSHOT.jar`,
- the python `scripts` folder (these implement an entity extraction based on Flair),
- the `lib` folders which provides linguistic models used by StanfordNLP and TreeTagger tools we build upon.
2. The `gui` folder with the file `gui.war` that allows us to run the web app.
3. The `data` folder with a few sample datasets (RDF, JSON, XML, etc.)
# ConnectionLens
ConnectionLens is a tool for finding connections between user-specified search terms across heterogeneous data sources. ConnectionLens treats a set of heterogeneous, independently authored data sources as a single virtual graph, whereas nodes represent fine-granularity data items (relational tuples, attributes, key-value pairs, RDF, JSON or XML nodes…) and edges correspond either to structural connections (e.g., a tuple is in a database, an attribute is in a tuple, a JSON node has a parent…) or to similarity (sameAs) links. To further enrich the content journalists work with, we also apply entity extraction which enables us to detect the people, organizations, etc. mentioned in the text, whether full-text or text snippets found e.g. in RDF or XML.
ConnectionLens is currently available as a command line application. It allows customizing many parameters, illustrated in `core/src/main/resources/local.settings` (for instance: default_locale controls the language etc.). Each parameter has a default value built in the JAR. You can change parameter values to your liking in the `core/src/main/resources/local.settings` file; **to make sure your settings are used, add `-c core/src/main/resources/local.settings` to the launch command.** A description of the parameters used in this file is given [here](https://gitlab.inria.fr/cedar/connectionlens/-/blob/master/docs/Parameters%20description.md)
# Full installation
## Software prerequisites
Required:
- java11 (tested with openjdk version "11.0.11")
- PostgreSQL (tested with v.12.6)
- Python 3.6.5
- Tomcat >=9.* (tested with v.9.0.52 and v.9.0.54)
Optional:
- [Graphviz (DOT)](https://www.graphviz.org/)
## Installation instructions
ConnectionLens can be run in two modes: *text* (command-line like), using the **jar**; and *graphical* (with the help of a GUI), by deploying the **war** in a Web server (we tested with Tomcat).
The respective installation instructions are:
- [ConnectionLens-Core installation instructions](docs/core_install.md)
- [ConnectionLens-Gui installation instructions](docs/gui_install.md)
## Example
The example below ingests 5 small data sources of different formats into a graph. It also shows how to query the graph and visualize the results.
#### Creating a small graph (command line)
Run the following command from the **core** folder with the following options:
```
java -jar connection-lens-core-full-1.1-SNAPSHOT.jar -DRDBMS_DBName=cl_myinstance -i ../data/poc/2/deputes.json,../data/poc/2/fb-etienne-chouard.txt,../data/poc/2/medias.txt,../data/poc/2/tweet-Ruffin.json,../data/poc/2/rt-wikipedia.txt
```

For more options, the following command will provide further details about applicable parameters and options:
java -jar connection-lens-core-full-1.1-SNAPSHOT.jar --help
```
Usage: java -jar connection-lens-full-.jar [options]
Options:
-qs, --collect-query-stats
If set, logs query-related statistics
Default: false
-rs, --collect-registration-stats
If set, logs registration-related statistics
Default: false
-ss, --collect-similarity-stats
If set, logs similarity-related statistics
Default: false
-c, --config
Path to configuration file. The file will be used to set all default
values. If the option is not set, default parameter files will searched
in the current directory. If no such file is found, build-in default
will be used.
-json, --export CL Graph to json
Use to export the CL graph in a json file.
Default: false
--force-similarity-computation
For the similarity computation to be run, even if no know datasource was
registered.
Default: false
-h, --help
Displays this help message.
Default: false
-lateIdx, --index-later
Is true, create necessary indexes at the beginning of the loading and
delay the creation of some tables and indexes to the last loading.
Default: false
-i, --input
A comma-separated list of files or directory paths. If a directory is specified, all descendant files are used as inputs.
Default: []
-a, --interactive-mode
If true, read incoming query from STDIN after the registration phase,
until EOF is reached.
Default: false
-last, --last
Is true, this will be the last loading after multiple loadings.
Default: false
-n, --noreset-at-start
Do NOT reset the data structures upon starting.
Default: false
-ou, --orignal-uri
Comma-separated list of original uris.
Default: []
-o, --output
Path to output file. Default: STDOUT
Default: java.io.PrintStream@61e717c2
-f, --output-format
The format in which to output the statistics.
Default: DEFAULT
Possible Values: [DEFAULT, MARKDOWN, LATEX]
-q, --query
A(single) keyword query to execute.
-Q, --query-file
A file containing one input query per line.
-v, --verbose
Use verbose mode.
Default: true
```
### Querying
A ConnectionLens query is a set of keywords; an answer is a subtree of the graph, that connects one node matching each keyword. To ask that a node matches more than one keyword, include those keywords within quotes.
#### Querying (command-line)
First, we can query the graph using an interactive, command-line interface.
After having loaded the graph as explained above, call the code with the following options:
```
java -jar connection-lens-core-full-1.1-SNAPSHOT.jar -DRDBMS_DBName=cl_myinstance -n -v -a
```
The `query>` indicates that the shell is ready to accept queries.

#### Gathering statistics about queries (command-line)
Assuming a set of queries are written in a query file (one query per line), the following call:
```
java -jar connection-lens-core-full-1.1-SNAPSHOT.jar -DRDBMS_DBName=cl_myinstance -n -qs -Q ../data/poc/2/demo.queries
```
will yield a set of statistics on each query: how long it took, how many answers were found, how long before the first answer was found etc.
#### Sample queries on the small example
Russie - 3 AT
Ruffin - 8 ATs
"François Ruffin" - 1 AT
Assemblée Chouard - 4392 ATs
Assemblée RussiaToday - 4177 ATs
#### Visualizing the graph and querying through the GUI
Follow the [GUI installation instructions](https://gitlab.inria.fr/cedar/connectionlens/-/blob/master/gui_install.md). Then, queries can be asked through the GUI and results can be visualized in the GUI. For instance, the screenshot below corresponds to the query "Briand Halluin Tonolli".

# Contributing to ConnectionLens
If you found a bug or issue with ConnectionLens please let us know.
You can report bugs on the [issue](https://gitlab.inria.fr/cedar/connectionlens/-/issues) tracker:
- Log in https://gitlab.inria.fr. If you need to create an account and do not work at Inria, please send an email to `connection-lens-admin@inria.fr` to help you establish an account.
- Create a new issue on https://gitlab.inria.fr/cedar/connectionlens/-/issues, giving as much information as possible (what did you succeed in doing, what is not working, etc.)
## About
ConnectionLens development started in 2018. See the [about us](about_us.md) page for a list of all authors.