Commit a7d2a998 authored by Oana Balalau's avatar Oana Balalau
Browse files

dockerfile and how to install

parent 4e041489
FROM ubuntu:20.04
MAINTAINER Oana Balalau <oana.balalau@inria.fr>
# installing all necessary system libraries
RUN apt-get update -yqq &&\
apt-get install --no-install-recommends software-properties-common -yqq &&\
add-apt-repository ppa:deadsnakes/ppa -y &&\
apt-get update -yqq &&\
apt-get install --no-install-recommends python3.7 \
vim \
python3-pip \
openjdk-8-jdk \
graphviz \
wget -yqq &&\
update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.7 1 &&\
rm -rf /var/lib/apt/lists/*
# copying CL into container
COPY core /connectionlens/core
RUN mkdir /var/connectionlens && mv /connectionlens/core/settings/properties /connectionlens/properties &&\
mv /connectionlens/core/scripts /var/connectionlens/scripts
COPY models /var/connectionlens/models
COPY requirements.txt /connectionlens/requirements.txt
COPY gui /connectionlens/gui
# installing all python libraries and moving resources
# installing Treetagger
# installing Tomcat
# changing the properties file
# putting models in default location for Stanford NER
RUN pip3 install -r /connectionlens/requirements.txt
RUN wget https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/tree-tagger-linux-3.2.3.tar.gz -P /var/temp/ &&\
mkdir /var/connectionlens/treetagger &&\
mkdir /var/connectionlens/treetagger/models &&\
tar -xf /var/temp/tree-tagger-linux-3.2.3.tar.gz -C /var/connectionlens/treetagger/ &&\
mv /var/connectionlens/models/french.par /var/connectionlens/treetagger/models/ &&\
mv /var/connectionlens/models/english-utf8.par /var/connectionlens/treetagger/models/ &&\
wget https://miroir.univ-lorraine.fr/apache/tomcat/tomcat-9/v9.0.37/bin/apache-tomcat-9.0.37.tar.gz -P /var/temp/ &&\
mkdir /var/connectionlens/tomcat/ &&\
tar -xf /var/temp/apache-tomcat-9.0.37.tar.gz -C /var/connectionlens/tomcat &&\
mkdir /var/connectionlens/tomcat/apache-tomcat-9.0.37/webapps/gui &&\
mv /connectionlens/gui/gui.war /var/connectionlens/tomcat/apache-tomcat-9.0.37/webapps/gui &&\
cd /var/connectionlens/tomcat/apache-tomcat-9.0.37/webapps/gui/ && jar -xvf gui.war &&\
sed -i 's/RDBMSHost =.*/RDBMSHost=db/' /connectionlens/properties &&\
sed -i 's/RDBMSUser =.*/RDBMSUser = kwsearch/' /connectionlens/properties &&\
sed -i 's/RDBMSPassword =.*/RDBMSPassword = kwsearch/' /connectionlens/properties &&\
sed -i 's/PYTHONPath =.*/PYTHONPath =\/usr\/bin\/python3.7/' /connectionlens/properties &&\
cp /connectionlens/properties /var/connectionlens/ &&\
cp /connectionlens/properties /var/connectionlens/tomcat/apache-tomcat-9.0.37/webapps/gui/WEB-INF/ && rm -r /connectionlens
ENV TREETAGGER_HOME=/var/connectionlens/treetagger
ENV CATALINA_HOME=/var/connectionlens/tomcat/apache-tomcat-9.0.37
ENV JAVA_OPTS='-Xmx2g'
ENTRYPOINT cp /settings/properties /var/connectionlens/ && sed -i 's/RDBMSHost =.*/RDBMSHost=db/' /var/connectionlens/properties &&\
sed -i 's/RDBMSPassword =.*/RDBMSPassword = kwsearch/' /var/connectionlens/properties &&\
sed -i 's/RDBMSUser =.*/RDBMSUser = kwsearch/' /var/connectionlens/properties &&\
sed -i 's/PYTHONPath =.*/PYTHONPath =\/usr\/bin\/python3.7/' /var/connectionlens/properties &&\
sed -i 's/RDBMSPort =.*/RDBMSPort = 5432/' /var/connectionlens/properties &&\
cp /var/connectionlens/properties /var/connectionlens/tomcat/apache-tomcat-9.0.37/webapps/gui/WEB-INF/ &&\
./var/connectionlens/tomcat/apache-tomcat-9.0.37/bin/catalina.sh run
EXPOSE 8080
......@@ -6,15 +6,35 @@ A clone of this repository includes:
- the jar file `connection-lens-core-full-1.1-SNAPSHOT.jar`,
- the python `scripts` folder (these implement an entity extraction based on Flair),
- a `properties` file which allow controlling multiple parameters related to the execution.
- a `properties` file which allows controlling multiple parameters related to the execution.
2. The `gui` folder with the file `gui.war` that allows to run the web app.
2. The `gui` folder with the file `gui.war` that allows us to run the web app.
3. The `data` folder with a few sample datasets (RDF, JSON, XML etc.)
3. The `data` folder with a few sample datasets (RDF, JSON, XML, etc.)
4. The `models` folder which provides linguistic models used by the TreeTagger and StanfordNLP tools we build upon.
# Install
# ConnectionLens
ConnectionLens is a tool for finding connections between user-specified search terms across heterogeneous data sources. ConnectionLens treats a set of heterogeneous, independently authored data sources as a single virtual graph, whereas nodes represent fine-granularity data items (relational tuples, attributes, key-value pairs, RDF, JSON or XML nodes…) and edges correspond either to structural connections (e.g., a tuple is in a database, an attribute is in a tuple, a JSON node has a parent…) or to similarity (sameAs) links. To further enrich the content journalists work with, we also apply entity extraction which enables us to detect the people, organizations, etc. mentioned in the text, whether full-text or text snippets found e.g. in RDF or XML.
ConnectionLens is available as a web application or as a terminal application. We provide two installations options: a beginner-friendly installation that will give access only to the web application and an advanced installation in which both terminal and web applications are installed.
# Installation using Docker
Docker is a platform that facilitates installing software on any operating system. Before proceeding, please install [Docker](https://docs.docker.com/get-docker/). Note that on Linux you might need to install also Docker Compose.
Please download the current repository, if you haven't done so: ```git clone --depth 1 https://gitlab.inria.fr/cedar/connectionlens.git```.
If you are using Windows, please go to Docker -> Settings -> Resources -> File Sharing -> Resources and add the folder connectionlens.
Only *the first time* you start the web application run the command: ```docker-compose build```.
Depending on your machine and internet connection this step might take 10-15 minutes.
To start the web application:```docker-compose up```.
We provide a [tutorial](https://gitlab.inria.fr/cedar/connectionlens/-/wikis/Using-the-GUI) on how to use the interface.
We note that Connectionlens has a property file in `core/settings/properties`. This file can be used to specify the language in which documents are written, via the parameter default_locale.
If you want to change the default language, French, and use English instead modify and save the file. To use the web application with the new language you need to restart it: ```docker-compose down && docker-compose up```.
# Advanced installation
## Software prerequisites
Required:
......@@ -43,7 +63,7 @@ The respective installation instructions are:
The example below ingests 5 small data sources of different formats into a graph. It also shows how to query the graph and visualize the results.
#### Creeting a small graph (command line)
#### Creating a small graph (command line)
From the main folder, call the jar in the `core` folder with the following options:
......@@ -89,12 +109,11 @@ Usage: java -jar connection-lens-full-<version>.jar [options]
Displays this help message.
Default: false
-lateIdx, --index-later
Is true, create necessary indexes at the begining of the loading and
delay the creation of some tables and index to the last loading.
Is true, create necessary indexes at the beginning of the loading and
delay the creation of some tables and indexes to the last loading.
Default: false
-i, --input
Comma-separated list of file or directory paths. If a directory is
specified, all descendant files are used as inputs.
A comma-separated list of files or directory paths. If a directory is specified, all descendant files are used as inputs.
Default: []
-a, --interactive-mode
If true, read incoming query from STDIN after the registration phase,
......@@ -172,7 +191,7 @@ will yield a set of statistics on each query: how long it took, how many answers
#### Visualizing the graph and querying through the GUI
Follow the [GUI installation instructions](https://gitlab.inria.fr/cedar/connectionlens/-/blob/master/gui_install.md).
Then, queries can be asked through the GUI and results can be visualized in the GUI. For instance, the screen shot below
Then, queries can be asked through the GUI and results can be visualized in the GUI. For instance, the screenshot below
corresponds to the query "Briand Halluin Tonolli".
![image_4.png](./image_4.png)
......@@ -184,7 +203,7 @@ You can report bugs on the [issue](https://gitlab.inria.fr/cedar/connectionlens/
- Log in https://gitlab.inria.fr. If you need to create an account and do not work at Inria, please send an email to `connection-lens-admin@inria.fr` to help you establish an account.
- Create a new issue on https://gitlab.inria.fr/cedar/connectionlens/-/issues, giving as much information as possible (what did you succeed in doing, what is not working etc.)
- Create a new issue on https://gitlab.inria.fr/cedar/connectionlens/-/issues, giving as much information as possible (what did you succeed in doing, what is not working, etc.)
## About
......
......@@ -274,3 +274,4 @@ create_abstract_graph=false
read_abstract_graph=false
query_only_specific_edge=false
version: '3.3'
services:
db:
image: postgres:9.6
restart: always
environment:
POSTGRES_USER: kwsearch
POSTGRES_PASSWORD: kwsearch
POSTGRES_DB: kwsearch
ports:
- 5400:5432
volumes:
- pgdata:/var/lib/postgresql/data
core:
image: connectionlens
build:
context: .
dockerfile: Dockerfile
restart: always
depends_on:
- db
ports:
- 8080:8080
volumes:
- ./core/settings:/settings
- tomcat:/var/connectionlens/tomcat/apache-tomcat-9.0.37/
volumes:
pgdata:
tomcat:
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment