README.md 9.18 KB
Newer Older
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
1
# Download
MERABTI Tayeb's avatar
MERABTI Tayeb committed
2

3
This repository includes: 
MERABTI Tayeb's avatar
MERABTI Tayeb committed
4

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
5
6
7
8
  1. The `core` folder which provides:
     
   - the jar file `connection-lens-core-full-1.1-SNAPSHOT.jar`, 
   - the python `scripts` folder (these implement an entity extraction based on Flair),
9
   - a `settings/local.settings` file which allows controlling multiple parameters related to the execution. 
MERABTI Tayeb's avatar
MERABTI Tayeb committed
10

Oana Balalau's avatar
Oana Balalau committed
11
  2. The `gui` folder with the file `gui.war` that allows us to run the web app. 
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
12
   
Oana Balalau's avatar
Oana Balalau committed
13
  3. The `data` folder with a few sample datasets  (RDF, JSON, XML, etc.)
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
14
15
   
  4. The `models` folder which provides linguistic models used by the TreeTagger and StanfordNLP tools we build upon.
MERABTI Tayeb's avatar
MERABTI Tayeb committed
16

Oana Balalau's avatar
Oana Balalau committed
17
18
# ConnectionLens

19
 ConnectionLens is a tool for finding connections between user-specified search terms across heterogeneous data sources. ConnectionLens treats a set of heterogeneous, independently authored data sources as a single virtual graph, whereas nodes represent fine-granularity data items (relational tuples, attributes, key-value pairs, RDF, JSON or XML nodes…) and edges correspond either to structural connections (e.g., a tuple is in a database, an attribute is in a tuple, a JSON node has a parent…) or to similarity (sameAs) links. To further enrich the content journalists work with, we also apply entity extraction which enables us to detect the people, organizations, etc. mentioned in the text, whether full-text or text snippets found e.g. in RDF or XML.
Oana Balalau's avatar
Oana Balalau committed
20

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
21

22
![image_9.png](image_9.png) 
Oana Balalau's avatar
Oana Balalau committed
23

24
<!-- ConnectionLens is available as a web application or as a command line application. We provide **two installations options**: 
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
25

26
27
28
29
30
31
* a beginner-friendly installation through a virtual image (Docker) that will give access only to the web application; 
* a full installation in which both command line and web application are installed.-->

ConnectionLens is currently available as a command line application. It allows customizing many parameters, illustrated in `core/settings/local.settings` (for instance: default_locale controls the language etc.). Each parameter has a default value built in the JAR. You can change parameter values to your liking in the `core/settings/local.settings` file; **to make sure your settings are used, add `-c core/settings/local.settings` to the launch command.**  

<!--# Installation using Docker
Oana Balalau's avatar
Oana Balalau committed
32
33
34
35
36
37
38
39

Docker is a platform that facilitates installing software on any operating system. Before proceeding, please install [Docker](https://docs.docker.com/get-docker/). Note that on Linux you might need to install also Docker Compose.

Please download the current repository, if you haven't done so: ```git clone --depth 1 https://gitlab.inria.fr/cedar/connectionlens.git```.
If you are using Windows, please go to Docker -> Settings -> Resources -> File Sharing -> Resources and add the folder connectionlens.
Only *the first time* you start the web application run the command: ```docker-compose build```.
Depending on your machine and internet connection this step might take 10-15 minutes.

Oana Balalau's avatar
Oana Balalau committed
40
To start the web application:```docker-compose up```. The web application will be available at `http://localhost:8080/gui/`.
Oana Balalau's avatar
Oana Balalau committed
41
We provide a [tutorial](https://gitlab.inria.fr/cedar/connectionlens/-/wikis/Using-the-GUI) on how to use the interface. 
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
42

43
If you want to change the default language, French, and use English instead modify and save the file. To use the web application with the new language you need to restart it: ```docker-compose down && docker-compose up```.-->
Oana Balalau's avatar
Oana Balalau committed
44

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
45
# Full installation
MERABTI Tayeb's avatar
MERABTI Tayeb committed
46

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
47
## Software prerequisites
MERABTI Tayeb's avatar
MERABTI Tayeb committed
48
49
Required: 
- Java >= 1.8
50
51
52
- PostgreSQL (tested with v.12.6)
- Python 3.6.5
- Tomcat >=8.* (tested with v.9.0)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
53

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
54
Optional: 
MERABTI Tayeb's avatar
MERABTI Tayeb committed
55
56
57
58

- [Graphviz (DOT)](https://www.graphviz.org/) 


MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
59
##  Installation instructions
MERABTI Tayeb's avatar
MERABTI Tayeb committed
60

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
61
62
ConnectionLens can be run in two modes: *text* (command-line like), using the **jar**;  and *graphical* (with the help of a GUI), by deploying the **war** in a Web server (we tested with Tomcat).
The respective installation instructions are: 
MERABTI Tayeb's avatar
MERABTI Tayeb committed
63

MERABTI Tayeb's avatar
save    
MERABTI Tayeb committed
64
- [ConnectionLens-Core installation instructions](core_install.md)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
65

MERABTI Tayeb's avatar
save    
MERABTI Tayeb committed
66
- [ConnectionLens-Gui installation instructions](gui_install.md)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
67
68


MERABTI Tayeb's avatar
MERABTI Tayeb committed
69

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
70
71
## Example
The example below ingests 5 small data sources of different formats into a graph. It also shows how to query the graph and visualize the results. 
MERABTI Tayeb's avatar
save    
MERABTI Tayeb committed
72

MERABTI Tayeb's avatar
MERABTI Tayeb committed
73

Oana Balalau's avatar
Oana Balalau committed
74
#### Creating a small graph (command line)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
75

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
76
From the main folder, call the jar in the `core` folder with the following options:
MERABTI Tayeb's avatar
MERABTI Tayeb committed
77

MERABTI Tayeb's avatar
MERABTI Tayeb committed
78
```
79
java -jar core/connection-lens-core-full-1.1-SNAPSHOT.jar -DRDBMSDBName=cl_myinstance -i data/poc/2/deputes.json,data/poc/2/fb-etienne-chouard.txt,data/poc/2/medias.txt,data/poc/2/tweet-Ruffin.json,data/poc/2/rt-wikipedia.txt
MERABTI Tayeb's avatar
MERABTI Tayeb committed
80
```
MERABTI Tayeb's avatar
MERABTI Tayeb committed
81

82
83
![image_3.png](./image_3.png)

MERABTI Tayeb's avatar
MERABTI Tayeb committed
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
For more options, the following command will provide further details about applicable parameters and options:

	java -jar core/connection-lens-core-full-1.1-SNAPSHOT.jar --help

```
Usage: java -jar connection-lens-full-<version>.jar [options]
  Options:
    -qs, --collect-query-stats
      If set, logs query-related statistics
      Default: false
    -rs, --collect-registration-stats
      If set, logs registration-related statistics
      Default: false
    -ss, --collect-similarity-stats
      If set, logs similarity-related statistics
      Default: false
    -c, --config
      Path to configuration file. The file will be used to set all default 
      values. If the option is not set, default parameter files will searched 
      in the current directory. If no such file is found, build-in default 
      will be used.
    -json, --export CL Graph to json
      Use to export the CL graph in a json file.
      Default: false
    --force-similarity-computation
      For the similarity computation to be run, even if no know datasource was 
      registered. 
      Default: false
    -h, --help
      Displays this help message.
      Default: false
    -lateIdx, --index-later
Oana Balalau's avatar
Oana Balalau committed
116
117
      Is true, create necessary indexes at the beginning of the loading and 
      delay the creation of some tables and indexes to the last loading.
MERABTI Tayeb's avatar
MERABTI Tayeb committed
118
119
      Default: false
    -i, --input
Oana Balalau's avatar
Oana Balalau committed
120
      A comma-separated list of files or directory paths. If a directory is specified, all descendant files are used as inputs.
MERABTI Tayeb's avatar
MERABTI Tayeb committed
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
      Default: []
    -a, --interactive-mode
      If true, read incoming query from STDIN after the registration phase, 
      until EOF is reached.
      Default: false
    -last, --last
      Is true, this will be the last loading after multiple loadings.
      Default: false
    -n, --noreset-at-start
      Do NOT reset the data structures upon starting.
      Default: false
    -ou, --orignal-uri
      Comma-separated list of original uris.
      Default: []
    -o, --output
      Path to output file. Default: STDOUT
      Default: java.io.PrintStream@61e717c2
    -f, --output-format
      The format in which to output the statistics.
      Default: DEFAULT
      Possible Values: [DEFAULT, MARKDOWN, LATEX]
    -q, --query
      A(single) keyword query to execute.
    -Q, --query-file
      A file containing one input query per line.
    -v, --verbose
      Use verbose mode.
      Default: true
```

151

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
152
### Querying 
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
153
154
155
156

A ConnectionLens query is a set of keywords; an answer is a subtree of the graph, that connects one node matching each keyword.
To ask that a node matches more than one keyword, include those keywords within quotes. 

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
157
#### Querying (command-line)
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
158
159
160
161

First, we can query the graph using an interactive, command-line interface.
After having loaded the graph as explained above, call the code with the following options:

MERABTI Tayeb's avatar
MERABTI Tayeb committed
162
```
MERABTI Tayeb's avatar
MERABTI Tayeb committed
163
java -jar core/connection-lens-core-full-1.1-SNAPSHOT.jar -DRDBMSDBName=cl_myinstance -n -v -a
MERABTI Tayeb's avatar
MERABTI Tayeb committed
164
```
MERABTI Tayeb's avatar
MERABTI Tayeb committed
165

MERABTI Tayeb's avatar
MERABTI Tayeb committed
166
The `query>` indicates that the shell is ready to accept queries.
MERABTI Tayeb's avatar
MERABTI Tayeb committed
167

168
![image_2.png](./image_2.png)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
169
170


MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
171
#### Gathering statistics about queries (command-line)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
172

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
173
Assuming a set of queries are written in a query file (one query per line), the following call:
MERABTI Tayeb's avatar
MERABTI Tayeb committed
174

MERABTI Tayeb's avatar
MERABTI Tayeb committed
175
```
176
java -jar core/connection-lens-core-full-1.1-SNAPSHOT.jar -DRDBMSDBName=cl_myinstance -n -qs -Q data/poc/2/demo.queries
MERABTI Tayeb's avatar
MERABTI Tayeb committed
177
178

```
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
179
180
181

will yield a set of statistics on each query: how long it took, how many answers were found, how long before the first answer was found etc.

MERABTI Tayeb's avatar
MERABTI Tayeb committed
182
 
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
183
#### Sample queries on the small example 
MERABTI Tayeb's avatar
MERABTI Tayeb committed
184

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
185
186

- Russie - 1 answer
MERABTI Tayeb's avatar
MERABTI Tayeb committed
187
- Chouard - 2 answers
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
188
189
190
- Ruffin - 13 answers
- Poutine - 1 answer
- Assemblée - 3 answers
MERABTI Tayeb's avatar
MERABTI Tayeb committed
191
- Soral Toulon - 1 answers
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
192
- Briand Halluin Tonolli - 1 answer
MERABTI Tayeb's avatar
MERABTI Tayeb committed
193
194


MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
195
#### Visualizing the graph and querying through the GUI
MERABTI Tayeb's avatar
MERABTI Tayeb committed
196

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
197
Follow the [GUI installation instructions](https://gitlab.inria.fr/cedar/connectionlens/-/blob/master/gui_install.md). 
Oana Balalau's avatar
Oana Balalau committed
198
Then, queries can be asked through the GUI and results can be visualized in the GUI. For instance, the screenshot below 
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
199
corresponds to the query "Briand Halluin Tonolli".
MERABTI Tayeb's avatar
MERABTI Tayeb committed
200

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
201
![image_4.png](./image_4.png)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
202

MERABTI Tayeb's avatar
MERABTI Tayeb committed
203
204
# Contributing to ConnectionLens

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
205
206
If you found a bug or issue with ConnectionLens please let us know. 
You can report bugs on the [issue](https://gitlab.inria.fr/cedar/connectionlens/-/issues) tracker: 
MERABTI Tayeb's avatar
MERABTI Tayeb committed
207

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
208
- Log in https://gitlab.inria.fr. If you need to create an account and do not work at Inria,  please send an email to `connection-lens-admin@inria.fr` to help you establish an account.
MERABTI Tayeb's avatar
MERABTI Tayeb committed
209

Oana Balalau's avatar
Oana Balalau committed
210
- Create a new issue on https://gitlab.inria.fr/cedar/connectionlens/-/issues, giving as much information as possible (what did you succeed in doing, what is not working, etc.)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
211

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
212
## About
MERABTI Tayeb's avatar
MERABTI Tayeb committed
213

MANOLESCU Ioana's avatar
Typo    
MANOLESCU Ioana committed
214
ConnectionLens development started in 2018. See the [about us](about_us.md) page for a list of all authors.