Commit a9c75600 authored by UPADHYAY Prajna Devi's avatar UPADHYAY Prajna Devi
Browse files

Updating public repository with latest development

parent 8978bf1c
# Download
A clone of this repository includes:
This repository includes:
1. The `core` folder which provides:
......@@ -16,16 +16,19 @@ A clone of this repository includes:
# ConnectionLens
ConnectionLens is a tool for finding connections between user-specified search terms across heterogeneous data sources. ConnectionLens treats a set of heterogeneous, independently authored data sources as a single virtual graph, whereas nodes represent fine-granularity data items (relational tuples, attributes, key-value pairs, RDF, JSON or XML nodes…) and edges correspond either to structural connections (e.g., a tuple is in a database, an attribute is in a tuple, a JSON node has a parent…) or to similarity (sameAs) links. To further enrich the content journalists work with, we also apply entity extraction which enables us to detect the people, organizations, etc. mentioned in the text, whether full-text or text snippets found e.g. in RDF or XML.
ConnectionLens is a tool for finding connections between user-specified search terms across heterogeneous data sources. ConnectionLens treats a set of heterogeneous, independently authored data sources as a single virtual graph, whereas nodes represent fine-granularity data items (relational tuples, attributes, key-value pairs, RDF, JSON or XML nodes…) and edges correspond either to structural connections (e.g., a tuple is in a database, an attribute is in a tuple, a JSON node has a parent…) or to similarity (sameAs) links. To further enrich the content journalists work with, we also apply entity extraction which enables us to detect the people, organizations, etc. mentioned in the text, whether full-text or text snippets found e.g. in RDF or XML.
ConnectionLens is available as a web application or as a command line application. We provide **two installations options**:
* a beginner-friendly installation trough a virtual image (Docker) that will give access only to the web application;
* a full installation in which both command line and web application are installed.
![image_9.png](image_9.png)
ConnectionLens allows customizing many parameters, illustrated in `core/settings/local.settings` (for instance: default_locale controls the language etc.). Each parameter has a default value built in the JAR. You can change parameter values to your liking in the `core/settings/local.settings` file; **to make sure your settings are used, add `-c core/settings/local.settings` to the launch command.**
<!-- ConnectionLens is available as a web application or as a command line application. We provide **two installations options**:
# Installation using Docker
* a beginner-friendly installation through a virtual image (Docker) that will give access only to the web application;
* a full installation in which both command line and web application are installed.-->
ConnectionLens is currently available as a command line application. It allows customizing many parameters, illustrated in `core/settings/local.settings` (for instance: default_locale controls the language etc.). Each parameter has a default value built in the JAR. You can change parameter values to your liking in the `core/settings/local.settings` file; **to make sure your settings are used, add `-c core/settings/local.settings` to the launch command.**
<!--# Installation using Docker
Docker is a platform that facilitates installing software on any operating system. Before proceeding, please install [Docker](https://docs.docker.com/get-docker/). Note that on Linux you might need to install also Docker Compose.
......@@ -37,7 +40,7 @@ Depending on your machine and internet connection this step might take 10-15 min
To start the web application:```docker-compose up```. The web application will be available at `http://localhost:8080/gui/`.
We provide a [tutorial](https://gitlab.inria.fr/cedar/connectionlens/-/wikis/Using-the-GUI) on how to use the interface.
If you want to change the default language, French, and use English instead modify and save the file. To use the web application with the new language you need to restart it: ```docker-compose down && docker-compose up```.
If you want to change the default language, French, and use English instead modify and save the file. To use the web application with the new language you need to restart it: ```docker-compose down && docker-compose up```.-->
# Full installation
......@@ -170,7 +173,7 @@ The `query>` indicates that the shell is ready to accept queries.
Assuming a set of queries are written in a query file (one query per line), the following call:
```
java -jar core/connection-lens-core-full-1.1-SNAPSHOT.jar -DRDBMSDBName=cl_myinstance -n -qs -Q core/data/poc/2/demo.queries
java -jar core/connection-lens-core-full-1.1-SNAPSHOT.jar -DRDBMSDBName=cl_myinstance -n -qs -Q data/poc/2/demo.queries
```
......
......@@ -32,12 +32,12 @@ default_locale=fr
#
# The path where dot is installed
drawing.dot_installation=/usr/local/bin/dot
drawing.draw=true
drawing.draw=false
# drawing coarse edges
drawing.coarse_edge=false
#
# Parameters for plotting solution numbers as a function of the search time
drawing.solution_times=true
drawing.solution_times=false
# The same directory as for drawing (parameter above) will be used.
# However, one can plot solutions or not, and draw trees Ror not, independently.
drawing.gnuplot_installation=/usr/local/bin/gnuplot
......@@ -135,7 +135,7 @@ flavor=lattice
# NODE COMPARISON AND SIMILARITY MEASURES
#
# Whether or not to perform node comparisons (true or false)
compare_nodes=true
compare_nodes=false
# Threshold above which same as are considered relevant
# similarity_threshold_hamming=-1
# similarity_threshold_jaro=-1
......
This diff is collapsed.
## PostgreSQL
A PostgreSQL server must be running locally, and you must have access to an account with the ability to create users. By default, the username is `kwsearch`, therefore this user must be
created beforehand. This username can be overridden by modifying the parameter `RDBMSUser` in
the file `core/setting/local.settings`.
the file `core/settings/local.settings`.
## TreeTagger
......@@ -45,9 +45,13 @@ Before using this extractor you need to follow these installation instructions o
***Install python:***
Install **python 3.***.
Install **python 3.6.5***. If it is not getting installed your existing python due to version issues, create a Python3.6 virtual environment using the command as follows:
In the **`core/setting/local.settings`** file update the `PYTHONPath` parameter with the path to your python installation.
```
python3.6 -m venv cl_env
```
In the **`core/settings/local.settings`** file update the `PYTHONPath` parameter with the path to your actual/virtual python installation.
By default the path is: `/usr/local/bin/python3.7`
......@@ -67,7 +71,7 @@ export PATH="$HOME/.cargo/bin:$PATH"
or adding it to your .tcshrc or .bashrc file.
***Install the python libraries required for Flair NLP NER*** by typing:
***Install the python libraries required for Flair NLP NER and PDF integration*** by typing:
`pip3 install -r requirements.txt`
......@@ -77,15 +81,6 @@ If you want to switch to `Flair NER`, you need to change the `extractor` paramet
`extractor=FLAIR_NER`
# PDF integration
The registration of PDF files into ConnectionLens is different compared to other formats already supported by the tool. For this, we used a Python tool (https://gitlab.inria.fr/cedar/pdf-integration) to translate the original PDF into `.json` and/or `n-triples`. This tool was integrated into ConnectionLens and need to install some python's libraries found in https://gitlab.inria.fr/cedar/connectionlens/-/tree/master/core/scripts/pdf_cripts/requirements.txt :
- `Download the requirement.txt` from https://gitlab.inria.fr/cedar/connectionlens/-/tree/master/core/scripts/pdf_scripts/requirements.txt .
- `pip3 install -r requirement.txt` .
# Locations of the Flair and PDF integration scripts
The folder [scripts](https://gitlab.inria.fr/cedar/connectionlens/-/tree/master/core/scripts/) contains Python scripts for Flair NER and content extraction from PDF files. By default, ConnectionLens expects these two folders to be under `/var/connectionlens/scripts`; you can chose another directory and indicate it in `core/setting/local.settings` as the value of the parameter `python_script_location`. You need to move the directories `Flair_NER_tool` and `pdf_scripts` under `/var/connectionlens/scripts/` (or the alternative directory you chose for the scripts).
......
......@@ -7,7 +7,7 @@ The GUI can be run through a Web server or from a J2EE Web application container
Install a Web server, such as [Tomcat 9.0](https://tomcat.apache.org/download-90.cgi)
In the Tomcat installation directory, there is a directory called `webapps`.
Copy `connection-lens/gui/target/gui.war` in Tomcat's `webapps` directory.
Copy `connectionlens/gui/target/gui.war` in Tomcat's `webapps` directory.
## Running from a J2EE Web application container
Drop the `gui.war` into your favorite J2EE web application container,
......@@ -33,4 +33,4 @@ Details on using the GUI are provided [here](https://gitlab.inria.fr/cedar/conne
set CATALINA_OPTS=-Dfile.encoding="UTF-8"
2. The Web navigators that are part of J2EE environments such as Eclipse Enterprise Edition are not as reliable and robust as standalone navigators. Try using a major browser such as Chrome, Safari or Firefox (the latter seems the most robust).
\ No newline at end of file
2. The Web navigators that are part of J2EE environments such as Eclipse Enterprise Edition are not as reliable and robust as standalone navigators. Try using a major browser such as Chrome, Safari or Firefox (the latter seems the most robust).
image_5.png

1.08 MB

image_6.png

1010 KB

image_9.png

48.5 KB

attrs==19.3.0
boto==2.49.0
boto3==1.14.30
botocore==1.17.30
bpemb==0.3.2
camelot-py==0.8.2
certifi==2020.6.20
cffi==1.14.1
chardet==3.0.4
click==7.1.2
cloudpickle==1.5.0
cryptography==3.0
cycler==0.10.0
dataclasses==0.6
decorator==4.4.2
Deprecated==1.2.10
docutils==0.15.2
Elixir==0.7.1
et-xmlfile==1.0.1
filelock==3.0.12
flair==0.4.5
Flask==1.1.2
future==0.18.2
gensim==3.8.3
hyperopt==0.2.4
idna==2.10
importlib-metadata==1.7.0
iniconfig==1.0.0
isodate==0.6.0
itsdangerous==1.1.0
jdcal==1.4.1
Jinja2==2.11.2
jmespath==0.10.0
joblib==0.16.0
kiwisolver==1.2.0
langdetect==1.0.8
lxml==4.5.2
MarkupSafe==1.1.1
matplotlib==3.3.0
more-itertools==8.4.0
mpld3==0.3
networkx==2.4
numpy==1.19.1
opencv-python==4.3.0.36
openpyxl==3.0.4
packaging==20.4
pandas==1.1.0
pbr==5.4.5
pdfminer.six==20200726
pikepdf==1.17.3
Pillow==7.2.0
plac==1.2.0
pluggy==0.13.1
protobuf==3.12.2
psutil==5.7.2
py==1.9.0
pycparser==2.20
pyparsing==2.4.7
PyPDF2==1.26.0
pytest==6.0.0
python-dateutil==2.8.1
python-dotenv==0.14.0
pytz==2020.1
rdflib==5.0.0
regex==2020.7.14
requests==2.24.0
s3transfer==0.3.3
sacremoses==0.0.43
scikit-learn==0.23.1
scipy==1.5.2
segtok==1.5.10
sentencepiece==0.1.91
six==1.15.0
smart-open==2.1.0
sortedcontainers==2.2.2
SQLAlchemy==0.9.6
sqlalchemy-migrate==0.13.0
sqlitedict==1.6.0
sqlparse==0.3.1
stanfordnlp==0.2.0
tabula==1.0.5
tabulate==0.8.7
Tempita==0.5.2
threadpoolctl==2.1.0
tika==1.24
tokenizers==0.8.1rc1
toml==0.10.1
torch==1.3.0
tqdm==4.48.0
transformers==3.0.2
urllib3==1.20
Werkzeug==1.0.1
wrapt==1.12.1
xlrd==0.7.1
xlwt==0.7.2
zipp==3.1.0
attrs==20.3.0
bio==0.0.1
biopython==1.77
bpemb==0.3.2
camelot-py==0.8.2
certifi==2020.11.8
cffi==1.14.2
chardet==3.0.4
click==7.1.2
cloudpickle==1.6.0
cryptography==3.0
cycler==0.10.0
dataclasses==0.6
decorator==4.4.2
Deprecated==1.2.10
et-xmlfile==1.0.1
filelock==3.0.12
flair==0.4.5
Flask==1.1.2
future==0.18.2
gensim==3.8.3
hyperopt==0.2.5
idna==2.10
importlib-metadata==3.1.0
iniconfig==1.1.1
intervaltree==3.1.0
isodate==0.6.0
itsdangerous==1.1.0
jdcal==1.4.1
Jinja2==2.11.2
joblib==0.17.0
kiwisolver==1.3.1
langdetect==1.0.8
lxml==4.5.2
MarkupSafe==1.1.1
matplotlib==3.3.3
mpld3==0.3
networkx==2.5
numpy==1.19.4
opencv-python==4.4.0.42
openpyxl==3.0.4
packaging==20.4
pandas==1.1.1
pdfminer.six==20200726
pikepdf==1.19.0
Pillow==8.0.1
plac==1.2.0
pluggy==0.13.1
protobuf==3.14.0
psutil==5.7.3
py==1.9.0
pycparser==2.20
pyparsing==2.4.7
PyPDF2==1.26.0
pytest==6.1.2
python-dateutil==2.8.1
python-dotenv==0.15.0
pytz==2020.1
rdflib==5.0.0
regex==2020.11.13
requests==2.25.0
sacremoses==0.0.43
scikit-learn==0.23.2
scipy==1.5.4
segtok==1.5.10
sentencepiece==0.1.91
six==1.15.0
smart-open==4.0.1
sortedcontainers==2.2.2
sqlitedict==1.7.0
stanfordnlp==0.2.0
tabulate==0.8.7
threadpoolctl==2.1.0
tika==1.24
tokenizers==0.9.3
toml==0.10.2
torch==1.3.0
tqdm==4.53.0
transformers==3.5.1
typing-extensions==3.7.4.3
urllib3==1.24.3
Werkzeug==1.0.1
wrapt==1.12.1
zipp==3.4.0
\ No newline at end of file
## POM version
project.version=1.1
## The branch and its version used to generate the jar and the war of the project.
project.git.branch=develop
project.git.version=e697451e
project.git.branch=bug_password
project.git.version=cdc6238f
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment