README.md 9.27 KB
Newer Older
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
1
# Download
MERABTI Tayeb's avatar
MERABTI Tayeb committed
2

3
This repository includes: 
MERABTI Tayeb's avatar
MERABTI Tayeb committed
4

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
5
6
7
  1. The `core` folder which provides:
     
   - the jar file `connection-lens-core-full-1.1-SNAPSHOT.jar`, 
UPADHYAY Prajna Devi's avatar
UPADHYAY Prajna Devi committed
8
9
   - the python `scripts` folder (these implement an entity extraction based on Flair), 
   - the `models` and `lib` folders which provides linguistic models used by StanfordNLP and TreTagger tools we build upon.
MERABTI Tayeb's avatar
MERABTI Tayeb committed
10

Oana Balalau's avatar
Oana Balalau committed
11
  2. The `gui` folder with the file `gui.war` that allows us to run the web app. 
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
12
   
Oana Balalau's avatar
Oana Balalau committed
13
  3. The `data` folder with a few sample datasets  (RDF, JSON, XML, etc.)
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
14
   
UPADHYAY Prajna Devi's avatar
UPADHYAY Prajna Devi committed
15
 
MERABTI Tayeb's avatar
MERABTI Tayeb committed
16

Oana Balalau's avatar
Oana Balalau committed
17
18
# ConnectionLens

19
 ConnectionLens is a tool for finding connections between user-specified search terms across heterogeneous data sources. ConnectionLens treats a set of heterogeneous, independently authored data sources as a single virtual graph, whereas nodes represent fine-granularity data items (relational tuples, attributes, key-value pairs, RDF, JSON or XML nodes…) and edges correspond either to structural connections (e.g., a tuple is in a database, an attribute is in a tuple, a JSON node has a parent…) or to similarity (sameAs) links. To further enrich the content journalists work with, we also apply entity extraction which enables us to detect the people, organizations, etc. mentioned in the text, whether full-text or text snippets found e.g. in RDF or XML.
Oana Balalau's avatar
Oana Balalau committed
20

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
21

UPADHYAY Prajna Devi's avatar
UPADHYAY Prajna Devi committed
22
![image_9.png](./docs/images/image_9.png) 
Oana Balalau's avatar
Oana Balalau committed
23

24
<!-- ConnectionLens is available as a web application or as a command line application. We provide **two installations options**: 
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
25

26
27
28
* a beginner-friendly installation through a virtual image (Docker) that will give access only to the web application; 
* a full installation in which both command line and web application are installed.-->

29
ConnectionLens is currently available as a command line application. It allows customizing many parameters, illustrated in `core/src/main/resources/local.settings` (for instance: default_locale controls the language etc.). Each parameter has a default value built in the JAR. You can change parameter values to your liking in the `core/src/main/resources/local.settings` file; **to make sure your settings are used, add `-c core/src/main/resources/local.settings` to the launch command.**  A description of the parameters used in this file is given [here](https://gitlab.inria.fr/cedar/connectionlens/-/blob/master/docs/Parameters%20description.md)
30
31

<!--# Installation using Docker
Oana Balalau's avatar
Oana Balalau committed
32
33
34
35
36
37
38
39

Docker is a platform that facilitates installing software on any operating system. Before proceeding, please install [Docker](https://docs.docker.com/get-docker/). Note that on Linux you might need to install also Docker Compose.

Please download the current repository, if you haven't done so: ```git clone --depth 1 https://gitlab.inria.fr/cedar/connectionlens.git```.
If you are using Windows, please go to Docker -> Settings -> Resources -> File Sharing -> Resources and add the folder connectionlens.
Only *the first time* you start the web application run the command: ```docker-compose build```.
Depending on your machine and internet connection this step might take 10-15 minutes.

Oana Balalau's avatar
Oana Balalau committed
40
To start the web application:```docker-compose up```. The web application will be available at `http://localhost:8080/gui/`.
Oana Balalau's avatar
Oana Balalau committed
41
We provide a [tutorial](https://gitlab.inria.fr/cedar/connectionlens/-/wikis/Using-the-GUI) on how to use the interface. 
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
42

43
If you want to change the default language, French, and use English instead modify and save the file. To use the web application with the new language you need to restart it: ```docker-compose down && docker-compose up```.-->
Oana Balalau's avatar
Oana Balalau committed
44

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
45
# Full installation
MERABTI Tayeb's avatar
MERABTI Tayeb committed
46

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
47
## Software prerequisites
MERABTI Tayeb's avatar
MERABTI Tayeb committed
48
49
Required: 
- Java >= 1.8
50
51
52
- PostgreSQL (tested with v.12.6)
- Python 3.6.5
- Tomcat >=8.* (tested with v.9.0)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
53

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
54
Optional: 
MERABTI Tayeb's avatar
MERABTI Tayeb committed
55
56
57
58

- [Graphviz (DOT)](https://www.graphviz.org/) 


MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
59
##  Installation instructions
MERABTI Tayeb's avatar
MERABTI Tayeb committed
60

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
61
62
ConnectionLens can be run in two modes: *text* (command-line like), using the **jar**;  and *graphical* (with the help of a GUI), by deploying the **war** in a Web server (we tested with Tomcat).
The respective installation instructions are: 
MERABTI Tayeb's avatar
MERABTI Tayeb committed
63

UPADHYAY Prajna Devi's avatar
UPADHYAY Prajna Devi committed
64
- [ConnectionLens-Core installation instructions](docs/core_install.md)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
65

UPADHYAY Prajna Devi's avatar
UPADHYAY Prajna Devi committed
66
- [ConnectionLens-Gui installation instructions](docs/gui_install.md)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
67
68


MERABTI Tayeb's avatar
MERABTI Tayeb committed
69

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
70
71
## Example
The example below ingests 5 small data sources of different formats into a graph. It also shows how to query the graph and visualize the results. 
MERABTI Tayeb's avatar
save    
MERABTI Tayeb committed
72

MERABTI Tayeb's avatar
MERABTI Tayeb committed
73

Oana Balalau's avatar
Oana Balalau committed
74
#### Creating a small graph (command line)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
75

76
Run the following command from the **core** folder with the following options:
MERABTI Tayeb's avatar
MERABTI Tayeb committed
77

MERABTI Tayeb's avatar
MERABTI Tayeb committed
78
```
79
java -jar connection-lens-core-full-1.1-SNAPSHOT.jar -DRDBMSDBName=cl_myinstance -i ../data/poc/2/deputes.json,../data/poc/2/fb-etienne-chouard.txt,../data/poc/2/medias.txt,../data/poc/2/tweet-Ruffin.json,../data/poc/2/rt-wikipedia.txt
MERABTI Tayeb's avatar
MERABTI Tayeb committed
80
```
MERABTI Tayeb's avatar
MERABTI Tayeb committed
81

82
![image_3.png](./docs/images/image_3.png)
83

MERABTI Tayeb's avatar
MERABTI Tayeb committed
84
85
For more options, the following command will provide further details about applicable parameters and options:

86
	java -jar connection-lens-core-full-1.1-SNAPSHOT.jar --help
MERABTI Tayeb's avatar
MERABTI Tayeb committed
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115

```
Usage: java -jar connection-lens-full-<version>.jar [options]
  Options:
    -qs, --collect-query-stats
      If set, logs query-related statistics
      Default: false
    -rs, --collect-registration-stats
      If set, logs registration-related statistics
      Default: false
    -ss, --collect-similarity-stats
      If set, logs similarity-related statistics
      Default: false
    -c, --config
      Path to configuration file. The file will be used to set all default 
      values. If the option is not set, default parameter files will searched 
      in the current directory. If no such file is found, build-in default 
      will be used.
    -json, --export CL Graph to json
      Use to export the CL graph in a json file.
      Default: false
    --force-similarity-computation
      For the similarity computation to be run, even if no know datasource was 
      registered. 
      Default: false
    -h, --help
      Displays this help message.
      Default: false
    -lateIdx, --index-later
Oana Balalau's avatar
Oana Balalau committed
116
117
      Is true, create necessary indexes at the beginning of the loading and 
      delay the creation of some tables and indexes to the last loading.
MERABTI Tayeb's avatar
MERABTI Tayeb committed
118
119
      Default: false
    -i, --input
Oana Balalau's avatar
Oana Balalau committed
120
      A comma-separated list of files or directory paths. If a directory is specified, all descendant files are used as inputs.
MERABTI Tayeb's avatar
MERABTI Tayeb committed
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
      Default: []
    -a, --interactive-mode
      If true, read incoming query from STDIN after the registration phase, 
      until EOF is reached.
      Default: false
    -last, --last
      Is true, this will be the last loading after multiple loadings.
      Default: false
    -n, --noreset-at-start
      Do NOT reset the data structures upon starting.
      Default: false
    -ou, --orignal-uri
      Comma-separated list of original uris.
      Default: []
    -o, --output
      Path to output file. Default: STDOUT
      Default: java.io.PrintStream@61e717c2
    -f, --output-format
      The format in which to output the statistics.
      Default: DEFAULT
      Possible Values: [DEFAULT, MARKDOWN, LATEX]
    -q, --query
      A(single) keyword query to execute.
    -Q, --query-file
      A file containing one input query per line.
    -v, --verbose
      Use verbose mode.
      Default: true
```

151

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
152
### Querying 
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
153

154
A ConnectionLens query is a set of keywords; an answer is a subtree of the graph, that connects one node matching each keyword. To ask that a node matches more than one keyword, include those keywords within quotes. 
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
155

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
156
#### Querying (command-line)
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
157
158
159
160

First, we can query the graph using an interactive, command-line interface.
After having loaded the graph as explained above, call the code with the following options:

MERABTI Tayeb's avatar
MERABTI Tayeb committed
161
```
162
java -jar connection-lens-core-full-1.1-SNAPSHOT.jar -DRDBMSDBName=cl_myinstance -n -v -a
MERABTI Tayeb's avatar
MERABTI Tayeb committed
163
```
MERABTI Tayeb's avatar
MERABTI Tayeb committed
164

MERABTI Tayeb's avatar
MERABTI Tayeb committed
165
The `query>` indicates that the shell is ready to accept queries.
MERABTI Tayeb's avatar
MERABTI Tayeb committed
166

167
![image_2.png](./docs/images/image_2.png)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
168
169


MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
170
#### Gathering statistics about queries (command-line)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
171

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
172
Assuming a set of queries are written in a query file (one query per line), the following call:
MERABTI Tayeb's avatar
MERABTI Tayeb committed
173

MERABTI Tayeb's avatar
MERABTI Tayeb committed
174
```
UPADHYAY Prajna Devi's avatar
UPADHYAY Prajna Devi committed
175
java -jar connection-lens-core-full-1.1-SNAPSHOT.jar -DRDBMSDBName=cl_myinstance -n -qs -Q ../data/poc/2/demo.queries
MERABTI Tayeb's avatar
MERABTI Tayeb committed
176
177

```
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
178
179
180

will yield a set of statistics on each query: how long it took, how many answers were found, how long before the first answer was found etc.

MERABTI Tayeb's avatar
MERABTI Tayeb committed
181
 
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
182
#### Sample queries on the small example 
MERABTI Tayeb's avatar
MERABTI Tayeb committed
183

184
Russie - 2 AT
MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
185

186
187
188
189
190
191
192
Ruffin - 8 ATs

"François Ruffin" - 1 AT

Assemblée Chouard - 4392 ATs

Assemblée RussiaToday - 4177 ATs
MERABTI Tayeb's avatar
MERABTI Tayeb committed
193
194


MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
195
#### Visualizing the graph and querying through the GUI
MERABTI Tayeb's avatar
MERABTI Tayeb committed
196

197
Follow the [GUI installation instructions](https://gitlab.inria.fr/cedar/connectionlens/-/blob/master/gui_install.md). Then, queries can be asked through the GUI and results can be visualized in the GUI. For instance, the screenshot below corresponds to the query "Briand Halluin Tonolli".
MERABTI Tayeb's avatar
MERABTI Tayeb committed
198

199
![image_4.png](./docs/images/image_4.png)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
200

MERABTI Tayeb's avatar
MERABTI Tayeb committed
201
202
# Contributing to ConnectionLens

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
203
204
If you found a bug or issue with ConnectionLens please let us know. 
You can report bugs on the [issue](https://gitlab.inria.fr/cedar/connectionlens/-/issues) tracker: 
MERABTI Tayeb's avatar
MERABTI Tayeb committed
205

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
206
- Log in https://gitlab.inria.fr. If you need to create an account and do not work at Inria,  please send an email to `connection-lens-admin@inria.fr` to help you establish an account.
MERABTI Tayeb's avatar
MERABTI Tayeb committed
207

Oana Balalau's avatar
Oana Balalau committed
208
- Create a new issue on https://gitlab.inria.fr/cedar/connectionlens/-/issues, giving as much information as possible (what did you succeed in doing, what is not working, etc.)
MERABTI Tayeb's avatar
MERABTI Tayeb committed
209

MANOLESCU Ioana's avatar
MANOLESCU Ioana committed
210
## About
MERABTI Tayeb's avatar
MERABTI Tayeb committed
211

MANOLESCU Ioana's avatar
Typo    
MANOLESCU Ioana committed
212
ConnectionLens development started in 2018. See the [about us](about_us.md) page for a list of all authors.