README.md 9 KB
Newer Older
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
1
# Download
MERABTI Tayeb's avatar
MERABTI Tayeb committed
2

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
3
A clone of this repository includes: 
MERABTI Tayeb's avatar
MERABTI Tayeb committed
4

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
5
6
7
8
  1. The `core` folder which provides:
     
   - the jar file `connection-lens-core-full-1.1-SNAPSHOT.jar`, 
   - the python `scripts` folder (these implement an entity extraction based on Flair),
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
9
   - a `settings/properties` file which allows controlling multiple parameters related to the execution. 
MERABTI Tayeb's avatar
MERABTI Tayeb committed
10

Oana Balalau's avatar
Oana Balalau committed
11
  2. The `gui` folder with the file `gui.war` that allows us to run the web app. 
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
12
   
Oana Balalau's avatar
Oana Balalau committed
13
  3. The `data` folder with a few sample datasets  (RDF, JSON, XML, etc.)
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
14
15
   
  4. The `models` folder which provides linguistic models used by the TreeTagger and StanfordNLP tools we build upon.
MERABTI Tayeb's avatar
MERABTI Tayeb committed
16

Oana Balalau's avatar
Oana Balalau committed
17
18
19
20
# ConnectionLens

 ConnectionLens is a tool for finding connections between user-specified search terms across heterogeneous data sources. ConnectionLens treats a set of heterogeneous, independently authored data sources as a single virtual graph, whereas nodes represent fine-granularity data items (relational tuples, attributes, key-value pairs, RDF, JSON or XML nodes…) and edges correspond either to structural connections (e.g., a tuple is in a database, an attribute is in a tuple, a JSON node has a parent…) or to similarity (sameAs) links. To further enrich the content journalists work with, we also apply entity extraction which enables us to detect the people, organizations, etc. mentioned in the text, whether full-text or text snippets found e.g. in RDF or XML. 

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
21
22
23
24
ConnectionLens is available as a web application or as a command line application. We provide **two installations options**: 

* a beginner-friendly installation trough a virtual image (Docker) that will give access only to the web application; 
* a full installation in which both command line and web application are installed.
Oana Balalau's avatar
Oana Balalau committed
25
26
27
28
29
30
31
32
33
34

# Installation using Docker

Docker is a platform that facilitates installing software on any operating system. Before proceeding, please install [Docker](https://docs.docker.com/get-docker/). Note that on Linux you might need to install also Docker Compose.

Please download the current repository, if you haven't done so: ```git clone --depth 1 https://gitlab.inria.fr/cedar/connectionlens.git```.
If you are using Windows, please go to Docker -> Settings -> Resources -> File Sharing -> Resources and add the folder connectionlens.
Only *the first time* you start the web application run the command: ```docker-compose build```.
Depending on your machine and internet connection this step might take 10-15 minutes.

Oana Balalau's avatar
Oana Balalau committed
35
To start the web application:```docker-compose up```. The web application will be available at `http://localhost:8080/gui/`.
Oana Balalau's avatar
Oana Balalau committed
36
37
38
39
We provide a [tutorial](https://gitlab.inria.fr/cedar/connectionlens/-/wikis/Using-the-GUI) on how to use the interface. 
We note that Connectionlens has a property file in `core/settings/properties`. This file can be used to specify the language in which documents are written, via the parameter default_locale.
If you want to change the default language, French, and use English instead modify and save the file. To use the web application with the new language you need to restart it: ```docker-compose down && docker-compose up```.

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
40
# Full installation
MERABTI Tayeb's avatar
MERABTI Tayeb committed
41

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
42
## Software prerequisites
MERABTI Tayeb's avatar
MERABTI Tayeb committed
43
44
45
46
47
48
Required: 
- Java >= 1.8
- PostgreSQL (tested with v.9.6)
- Python 3
- Tomcat >=8.*

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
49
Optional: 
MERABTI Tayeb's avatar
MERABTI Tayeb committed
50
51
52
53

- [Graphviz (DOT)](https://www.graphviz.org/) 


MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
54
##  Installation instructions
MERABTI Tayeb's avatar
MERABTI Tayeb committed
55

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
56
57
ConnectionLens can be run in two modes: *text* (command-line like), using the **jar**;  and *graphical* (with the help of a GUI), by deploying the **war** in a Web server (we tested with Tomcat).
The respective installation instructions are: 
MERABTI Tayeb's avatar
MERABTI Tayeb committed
58

MERABTI Tayeb's avatar
save    
MERABTI Tayeb committed
59
- [ConnectionLens-Core installation instructions](core_install.md)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
60

MERABTI Tayeb's avatar
save    
MERABTI Tayeb committed
61
- [ConnectionLens-Gui installation instructions](gui_install.md)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
62
63


MERABTI Tayeb's avatar
MERABTI Tayeb committed
64

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
65
66
## Example
The example below ingests 5 small data sources of different formats into a graph. It also shows how to query the graph and visualize the results. 
MERABTI Tayeb's avatar
save    
MERABTI Tayeb committed
67

MERABTI Tayeb's avatar
MERABTI Tayeb committed
68

Oana Balalau's avatar
Oana Balalau committed
69
#### Creating a small graph (command line)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
70

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
71
From the main folder, call the jar in the `core` folder with the following options:
MERABTI Tayeb's avatar
MERABTI Tayeb committed
72

MERABTI Tayeb's avatar
MERABTI Tayeb committed
73
```
74
java -jar core/connection-lens-core-full-1.1-SNAPSHOT.jar -DRDBMSDBName=cl_myinstance -i data/poc/2/deputes.json,data/poc/2/fb-etienne-chouard.txt,data/poc/2/medias.txt,data/poc/2/tweet-Ruffin.json,data/poc/2/rt-wikipedia.txt
MERABTI Tayeb's avatar
MERABTI Tayeb committed
75
```
MERABTI Tayeb's avatar
MERABTI Tayeb committed
76

77
78
![image_3.png](./image_3.png)

MERABTI Tayeb's avatar
MERABTI Tayeb committed
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
For more options, the following command will provide further details about applicable parameters and options:

	java -jar core/connection-lens-core-full-1.1-SNAPSHOT.jar --help

```
Usage: java -jar connection-lens-full-<version>.jar [options]
  Options:
    -qs, --collect-query-stats
      If set, logs query-related statistics
      Default: false
    -rs, --collect-registration-stats
      If set, logs registration-related statistics
      Default: false
    -ss, --collect-similarity-stats
      If set, logs similarity-related statistics
      Default: false
    -c, --config
      Path to configuration file. The file will be used to set all default 
      values. If the option is not set, default parameter files will searched 
      in the current directory. If no such file is found, build-in default 
      will be used.
    -E, --eager-extractor
      Is false, lazy extraction results are cached on disk for future similar 
      calls. 
      Default: false
    -json, --export CL Graph to json
      Use to export the CL graph in a json file.
      Default: false
    --force-similarity-computation
      For the similarity computation to be run, even if no know datasource was 
      registered. 
      Default: false
    -h, --help
      Displays this help message.
      Default: false
    -lateIdx, --index-later
Oana Balalau's avatar
Oana Balalau committed
115
116
      Is true, create necessary indexes at the beginning of the loading and 
      delay the creation of some tables and indexes to the last loading.
MERABTI Tayeb's avatar
MERABTI Tayeb committed
117
118
      Default: false
    -i, --input
Oana Balalau's avatar
Oana Balalau committed
119
      A comma-separated list of files or directory paths. If a directory is specified, all descendant files are used as inputs.
MERABTI Tayeb's avatar
MERABTI Tayeb committed
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
      Default: []
    -a, --interactive-mode
      If true, read incoming query from STDIN after the registration phase, 
      until EOF is reached.
      Default: false
    -last, --last
      Is true, this will be the last loading after multiple loadings.
      Default: false
    -n, --noreset-at-start
      Do NOT reset the data structures upon starting.
      Default: false
    -ou, --orignal-uri
      Comma-separated list of original uris.
      Default: []
    -o, --output
      Path to output file. Default: STDOUT
      Default: java.io.PrintStream@61e717c2
    -f, --output-format
      The format in which to output the statistics.
      Default: DEFAULT
      Possible Values: [DEFAULT, MARKDOWN, LATEX]
    -q, --query
      A(single) keyword query to execute.
    -Q, --query-file
      A file containing one input query per line.
    -v, --verbose
      Use verbose mode.
      Default: true
```

150

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
151
### Querying 
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
152
153
154
155

A ConnectionLens query is a set of keywords; an answer is a subtree of the graph, that connects one node matching each keyword.
To ask that a node matches more than one keyword, include those keywords within quotes. 

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
156
#### Querying (command-line)
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
157
158
159
160

First, we can query the graph using an interactive, command-line interface.
After having loaded the graph as explained above, call the code with the following options:

MERABTI Tayeb's avatar
MERABTI Tayeb committed
161
```
MERABTI Tayeb's avatar
MERABTI Tayeb committed
162
java -jar core/connection-lens-core-full-1.1-SNAPSHOT.jar -DRDBMSDBName=cl_myinstance -n -v -a
MERABTI Tayeb's avatar
MERABTI Tayeb committed
163
```
MERABTI Tayeb's avatar
MERABTI Tayeb committed
164

MERABTI Tayeb's avatar
MERABTI Tayeb committed
165
The `query>` indicates that the shell is ready to accept queries.
MERABTI Tayeb's avatar
MERABTI Tayeb committed
166

167
![image_2.png](./image_2.png)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
168
169


MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
170
#### Gathering statistics about queries (command-line)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
171

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
172
Assuming a set of queries are written in a query file (one query per line), the following call:
MERABTI Tayeb's avatar
MERABTI Tayeb committed
173

MERABTI Tayeb's avatar
MERABTI Tayeb committed
174
175
176
177
```
java -jar core/connection-lens-core-full-1.1-SNAPSHOT.jar -DRDBMSDBName=cl_myinstance -n -qs -Q core/data/poc/2/demo.queries

```
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
178
179
180

will yield a set of statistics on each query: how long it took, how many answers were found, how long before the first answer was found etc.

MERABTI Tayeb's avatar
MERABTI Tayeb committed
181
 
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
182
#### Sample queries on the small example 
MERABTI Tayeb's avatar
MERABTI Tayeb committed
183

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
184
185

- Russie - 1 answer
MERABTI Tayeb's avatar
MERABTI Tayeb committed
186
- Chouard - 2 answers
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
187
188
189
- Ruffin - 13 answers
- Poutine - 1 answer
- Assemblée - 3 answers
MERABTI Tayeb's avatar
MERABTI Tayeb committed
190
- Soral Toulon - 1 answers
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
191
- Briand Halluin Tonolli - 1 answer
MERABTI Tayeb's avatar
MERABTI Tayeb committed
192
193


MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
194
#### Visualizing the graph and querying through the GUI
MERABTI Tayeb's avatar
MERABTI Tayeb committed
195

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
196
Follow the [GUI installation instructions](https://gitlab.inria.fr/cedar/connectionlens/-/blob/master/gui_install.md). 
Oana Balalau's avatar
Oana Balalau committed
197
Then, queries can be asked through the GUI and results can be visualized in the GUI. For instance, the screenshot below 
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
198
corresponds to the query "Briand Halluin Tonolli".
MERABTI Tayeb's avatar
MERABTI Tayeb committed
199

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
200
![image_4.png](./image_4.png)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
201

MERABTI Tayeb's avatar
MERABTI Tayeb committed
202
203
# Contributing to ConnectionLens

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
204
205
If you found a bug or issue with ConnectionLens please let us know. 
You can report bugs on the [issue](https://gitlab.inria.fr/cedar/connectionlens/-/issues) tracker: 
MERABTI Tayeb's avatar
MERABTI Tayeb committed
206

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
207
- Log in https://gitlab.inria.fr. If you need to create an account and do not work at Inria,  please send an email to `connection-lens-admin@inria.fr` to help you establish an account.
MERABTI Tayeb's avatar
MERABTI Tayeb committed
208

Oana Balalau's avatar
Oana Balalau committed
209
- Create a new issue on https://gitlab.inria.fr/cedar/connectionlens/-/issues, giving as much information as possible (what did you succeed in doing, what is not working, etc.)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
210

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
211
## About
MERABTI Tayeb's avatar
MERABTI Tayeb committed
212

MANOLESCU Ioana's avatar
Typo    
MANOLESCU Ioana committed
213
ConnectionLens development started in 2018. See the [about us](about_us.md) page for a list of all authors.