README.md 9.34 KB
Newer Older
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
1
# Download
MERABTI Tayeb's avatar
MERABTI Tayeb committed
2

3
This repository includes: 
MERABTI Tayeb's avatar
MERABTI Tayeb committed
4

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
5
6
7
  1. The `core` folder which provides:
     
   - the jar file `connection-lens-core-full-1.1-SNAPSHOT.jar`, 
UPADHYAY Prajna Devi's avatar
UPADHYAY Prajna Devi committed
8
   - the python `scripts` folder (these implement an entity extraction based on Flair), 
9
   - the `lib` folders which provides linguistic models used by StanfordNLP and TreeTagger tools we build upon.
MERABTI Tayeb's avatar
MERABTI Tayeb committed
10

Oana Balalau's avatar
Oana Balalau committed
11
  2. The `gui` folder with the file `gui.war` that allows us to run the web app. 
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
12
   
Oana Balalau's avatar
Oana Balalau committed
13
  3. The `data` folder with a few sample datasets  (RDF, JSON, XML, etc.)
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
14
   
UPADHYAY Prajna Devi's avatar
UPADHYAY Prajna Devi committed
15
 
MERABTI Tayeb's avatar
MERABTI Tayeb committed
16

Oana Balalau's avatar
Oana Balalau committed
17
18
# ConnectionLens

19
 ConnectionLens is a tool for finding connections between user-specified search terms across heterogeneous data sources. ConnectionLens treats a set of heterogeneous, independently authored data sources as a single virtual graph, whereas nodes represent fine-granularity data items (relational tuples, attributes, key-value pairs, RDF, JSON or XML nodes…) and edges correspond either to structural connections (e.g., a tuple is in a database, an attribute is in a tuple, a JSON node has a parent…) or to similarity (sameAs) links. To further enrich the content journalists work with, we also apply entity extraction which enables us to detect the people, organizations, etc. mentioned in the text, whether full-text or text snippets found e.g. in RDF or XML.
Oana Balalau's avatar
Oana Balalau committed
20

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
21

UPADHYAY Prajna Devi's avatar
UPADHYAY Prajna Devi committed
22
23

<img src="./docs/images/CL.png" width="450" height="400" align="center">
Oana Balalau's avatar
Oana Balalau committed
24

25
<!-- ConnectionLens is available as a web application or as a command line application. We provide **two installations options**: 
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
26

27
28
29
* a beginner-friendly installation through a virtual image (Docker) that will give access only to the web application; 
* a full installation in which both command line and web application are installed.-->

30
ConnectionLens is currently available as a command line application. It allows customizing many parameters, illustrated in `core/src/main/resources/local.settings` (for instance: default_locale controls the language etc.). Each parameter has a default value built in the JAR. You can change parameter values to your liking in the `core/src/main/resources/local.settings` file; **to make sure your settings are used, add `-c core/src/main/resources/local.settings` to the launch command.**  A description of the parameters used in this file is given [here](https://gitlab.inria.fr/cedar/connectionlens/-/blob/master/docs/Parameters%20description.md)
31
32

<!--# Installation using Docker
Oana Balalau's avatar
Oana Balalau committed
33
34
35
36
37
38
39
40

Docker is a platform that facilitates installing software on any operating system. Before proceeding, please install [Docker](https://docs.docker.com/get-docker/). Note that on Linux you might need to install also Docker Compose.

Please download the current repository, if you haven't done so: ```git clone --depth 1 https://gitlab.inria.fr/cedar/connectionlens.git```.
If you are using Windows, please go to Docker -> Settings -> Resources -> File Sharing -> Resources and add the folder connectionlens.
Only *the first time* you start the web application run the command: ```docker-compose build```.
Depending on your machine and internet connection this step might take 10-15 minutes.

Oana Balalau's avatar
Oana Balalau committed
41
To start the web application:```docker-compose up```. The web application will be available at `http://localhost:8080/gui/`.
Oana Balalau's avatar
Oana Balalau committed
42
We provide a [tutorial](https://gitlab.inria.fr/cedar/connectionlens/-/wikis/Using-the-GUI) on how to use the interface. 
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
43

44
If you want to change the default language, French, and use English instead modify and save the file. To use the web application with the new language you need to restart it: ```docker-compose down && docker-compose up```.-->
Oana Balalau's avatar
Oana Balalau committed
45

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
46
# Full installation
MERABTI Tayeb's avatar
MERABTI Tayeb committed
47

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
48
## Software prerequisites
MERABTI Tayeb's avatar
MERABTI Tayeb committed
49
Required: 
UPADHYAY Prajna Devi's avatar
UPADHYAY Prajna Devi committed
50
- java11 (tested with openjdk version "11.0.11")
51
52
- PostgreSQL (tested with v.12.6)
- Python 3.6.5
UPADHYAY Prajna Devi's avatar
UPADHYAY Prajna Devi committed
53
- Tomcat >=9.* (tested with v.9.0.52 and v.9.0.54)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
54

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
55
Optional: 
MERABTI Tayeb's avatar
MERABTI Tayeb committed
56
57
58
59

- [Graphviz (DOT)](https://www.graphviz.org/) 


MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
60
##  Installation instructions
MERABTI Tayeb's avatar
MERABTI Tayeb committed
61

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
62
63
ConnectionLens can be run in two modes: *text* (command-line like), using the **jar**;  and *graphical* (with the help of a GUI), by deploying the **war** in a Web server (we tested with Tomcat).
The respective installation instructions are: 
MERABTI Tayeb's avatar
MERABTI Tayeb committed
64

UPADHYAY Prajna Devi's avatar
UPADHYAY Prajna Devi committed
65
- [ConnectionLens-Core installation instructions](docs/core_install.md)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
66

UPADHYAY Prajna Devi's avatar
UPADHYAY Prajna Devi committed
67
- [ConnectionLens-Gui installation instructions](docs/gui_install.md)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
68
69


MERABTI Tayeb's avatar
MERABTI Tayeb committed
70

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
71
72
## Example
The example below ingests 5 small data sources of different formats into a graph. It also shows how to query the graph and visualize the results. 
MERABTI Tayeb's avatar
save    
MERABTI Tayeb committed
73

MERABTI Tayeb's avatar
MERABTI Tayeb committed
74

Oana Balalau's avatar
Oana Balalau committed
75
#### Creating a small graph (command line)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
76

77
Run the following command from the **core** folder with the following options:
MERABTI Tayeb's avatar
MERABTI Tayeb committed
78

MERABTI Tayeb's avatar
MERABTI Tayeb committed
79
```
80
java -jar connection-lens-core-full-1.1-SNAPSHOT.jar -DRDBMS_DBName=cl_myinstance -i ../data/poc/2/deputes.json,../data/poc/2/fb-etienne-chouard.txt,../data/poc/2/medias.txt,../data/poc/2/tweet-Ruffin.json,../data/poc/2/rt-wikipedia.txt
MERABTI Tayeb's avatar
MERABTI Tayeb committed
81
```
MERABTI Tayeb's avatar
MERABTI Tayeb committed
82

83
![image_3.png](./docs/images/image_3.png)
84

MERABTI Tayeb's avatar
MERABTI Tayeb committed
85
86
For more options, the following command will provide further details about applicable parameters and options:

87
	java -jar connection-lens-core-full-1.1-SNAPSHOT.jar --help
MERABTI Tayeb's avatar
MERABTI Tayeb committed
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116

```
Usage: java -jar connection-lens-full-<version>.jar [options]
  Options:
    -qs, --collect-query-stats
      If set, logs query-related statistics
      Default: false
    -rs, --collect-registration-stats
      If set, logs registration-related statistics
      Default: false
    -ss, --collect-similarity-stats
      If set, logs similarity-related statistics
      Default: false
    -c, --config
      Path to configuration file. The file will be used to set all default 
      values. If the option is not set, default parameter files will searched 
      in the current directory. If no such file is found, build-in default 
      will be used.
    -json, --export CL Graph to json
      Use to export the CL graph in a json file.
      Default: false
    --force-similarity-computation
      For the similarity computation to be run, even if no know datasource was 
      registered. 
      Default: false
    -h, --help
      Displays this help message.
      Default: false
    -lateIdx, --index-later
Oana Balalau's avatar
Oana Balalau committed
117
118
      Is true, create necessary indexes at the beginning of the loading and 
      delay the creation of some tables and indexes to the last loading.
MERABTI Tayeb's avatar
MERABTI Tayeb committed
119
120
      Default: false
    -i, --input
Oana Balalau's avatar
Oana Balalau committed
121
      A comma-separated list of files or directory paths. If a directory is specified, all descendant files are used as inputs.
MERABTI Tayeb's avatar
MERABTI Tayeb committed
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
      Default: []
    -a, --interactive-mode
      If true, read incoming query from STDIN after the registration phase, 
      until EOF is reached.
      Default: false
    -last, --last
      Is true, this will be the last loading after multiple loadings.
      Default: false
    -n, --noreset-at-start
      Do NOT reset the data structures upon starting.
      Default: false
    -ou, --orignal-uri
      Comma-separated list of original uris.
      Default: []
    -o, --output
      Path to output file. Default: STDOUT
      Default: java.io.PrintStream@61e717c2
    -f, --output-format
      The format in which to output the statistics.
      Default: DEFAULT
      Possible Values: [DEFAULT, MARKDOWN, LATEX]
    -q, --query
      A(single) keyword query to execute.
    -Q, --query-file
      A file containing one input query per line.
    -v, --verbose
      Use verbose mode.
      Default: true
```

152

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
153
### Querying 
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
154

155
A ConnectionLens query is a set of keywords; an answer is a subtree of the graph, that connects one node matching each keyword. To ask that a node matches more than one keyword, include those keywords within quotes. 
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
156

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
157
#### Querying (command-line)
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
158
159
160
161

First, we can query the graph using an interactive, command-line interface.
After having loaded the graph as explained above, call the code with the following options:

MERABTI Tayeb's avatar
MERABTI Tayeb committed
162
```
163
java -jar connection-lens-core-full-1.1-SNAPSHOT.jar -DRDBMS_DBName=cl_myinstance -n -v -a
MERABTI Tayeb's avatar
MERABTI Tayeb committed
164
```
MERABTI Tayeb's avatar
MERABTI Tayeb committed
165

MERABTI Tayeb's avatar
MERABTI Tayeb committed
166
The `query>` indicates that the shell is ready to accept queries.
MERABTI Tayeb's avatar
MERABTI Tayeb committed
167

168
![image_2.png](./docs/images/image_2.png)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
169
170


MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
171
#### Gathering statistics about queries (command-line)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
172

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
173
Assuming a set of queries are written in a query file (one query per line), the following call:
MERABTI Tayeb's avatar
MERABTI Tayeb committed
174

MERABTI Tayeb's avatar
MERABTI Tayeb committed
175
```
176
java -jar connection-lens-core-full-1.1-SNAPSHOT.jar -DRDBMS_DBName=cl_myinstance -n -qs -Q ../data/poc/2/demo.queries
MERABTI Tayeb's avatar
MERABTI Tayeb committed
177
178

```
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
179
180
181

will yield a set of statistics on each query: how long it took, how many answers were found, how long before the first answer was found etc.

MERABTI Tayeb's avatar
MERABTI Tayeb committed
182
 
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
183
#### Sample queries on the small example 
MERABTI Tayeb's avatar
MERABTI Tayeb committed
184

185
Russie - 3 AT
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
186

187
188
189
190
191
192
193
Ruffin - 8 ATs

"François Ruffin" - 1 AT

Assemblée Chouard - 4392 ATs

Assemblée RussiaToday - 4177 ATs
MERABTI Tayeb's avatar
MERABTI Tayeb committed
194
195


MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
196
#### Visualizing the graph and querying through the GUI
MERABTI Tayeb's avatar
MERABTI Tayeb committed
197

198
Follow the [GUI installation instructions](https://gitlab.inria.fr/cedar/connectionlens/-/blob/master/gui_install.md). Then, queries can be asked through the GUI and results can be visualized in the GUI. For instance, the screenshot below corresponds to the query "Briand Halluin Tonolli".
MERABTI Tayeb's avatar
MERABTI Tayeb committed
199

200
![image_4.png](./docs/images/image_4.png)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
201

MERABTI Tayeb's avatar
MERABTI Tayeb committed
202
203
# Contributing to ConnectionLens

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
204
205
If you found a bug or issue with ConnectionLens please let us know. 
You can report bugs on the [issue](https://gitlab.inria.fr/cedar/connectionlens/-/issues) tracker: 
MERABTI Tayeb's avatar
MERABTI Tayeb committed
206

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
207
- Log in https://gitlab.inria.fr. If you need to create an account and do not work at Inria,  please send an email to `connection-lens-admin@inria.fr` to help you establish an account.
MERABTI Tayeb's avatar
MERABTI Tayeb committed
208

Oana Balalau's avatar
Oana Balalau committed
209
- Create a new issue on https://gitlab.inria.fr/cedar/connectionlens/-/issues, giving as much information as possible (what did you succeed in doing, what is not working, etc.)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
210

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
211
## About
MERABTI Tayeb's avatar
MERABTI Tayeb committed
212

MANOLESCU Ioana's avatar
Typo    
MANOLESCU Ioana committed
213
ConnectionLens development started in 2018. See the [about us](about_us.md) page for a list of all authors.