Linkex
Linkex is a tool allowing to discover link keys candidate from two RDF datasets. Link keys generalise the combination of keys and ontology alignments for data interlinking. A link key is a set of pairs of properties that uniquely identify the instances of two classes of two RDF datasets. For example, {(hasCreator, aAuteur), (hasTitle, aTitre)} for (Book, Livre), which states that, if an instance of Book have the same values for hasCreator and aAuteur as an instance of Book has for hasCreator and hasTitle, the two instances are the same.
This tool can extract link keys candidates and evaluate them using discriminability and coverage. It can also evaluate them according to reference set of links given as input. It is able to extract candidates with composed properties, and inverse properties.
Linkex is free software distributed it under the terms of the Lesser GNU General Public License.
If you use this software and want to give it credit, please cite:
Manuel Atencia, Jérôme David, Jérôme Euzenat, Data interlinking through robust link key extraction, Proc. 21st ECAI, Prague (CK), pp15-20, 2014.
Installation
-
You need to have Maven installed
-
Download and unzip Alignment API.
Use these commands to manually install alignment API :
wget ftp://ftp.inrialpes.fr/pub/exmo/software/ontoalign/align-4.10.zip
unzip align-4.10.zip -d alignapi
mvn install:install-file -Dfile=alignapi/lib/procalign.jar -DgroupId=fr.inrialpes.exmo.align -DartifactId=procalign -Dversion=4.10 -Dpackaging=jar
mvn install:install-file -Dfile=alignapi/lib/ontowrap.jar -DgroupId=fr.inrialpes.exmo.ontowrap -DartifactId=ontowrap -Dversion=4.10 -Dpackaging=jar
mvn install:install-file -Dfile=alignapi/lib/align.jar -DgroupId=org.semanticweb.owl.align -DartifactId=align -Dversion=4.10 -Dpackaging=jar
- Clone this repository :
git clone git@gitlab.inria.fr:moex/linkex.git
- Move to the linkex depository :
cd linkex
- Compile and package into a jar
mvn package
This should create the file target/LinkkeyDiscovery-1.0-SNAPSHOT-jar-with-dependencies.jar
Run the extraction tool
Link key extraction tool can be run from command line.
From the linkex directory, you can get the followinf help message:
java -jar target/LinkkeyDiscovery-1.0-SNAPSHOT-jar-with-dependencies.jar -help
usage: java fr.inrialpes.exmo.linkkey.LinkkeyDiscoveryAlgorithm [options]
dataset1 dataset2
-b <b> find links between blank nodes (true by
default)
-c <composition length> compose properties
-c1 <uri1> Uri of the first class (if omitted, all
instances are considered)
-c2 <uri2> Uri of the second class (if omitted, all
instances are considered)
-classes extracts link keys candidates with classes
-classesfull extracts link keys candidates with classes
full (may be very expensive)
-d <d> property discriminability threshold
-e <e> use the given reference links for precision
and recall evaluation.the links are given in
RDF (i.e. a list of triples with predicate
owl:sameAs)
-f,--format <format> Format of the output: txt (default), edoal,
html, bin, dot, txt2 (txt with links)
-help print this message
-i considers inverse of properties (only useful
with -c)
-l Lazy mode, data will be loaded when needed
(only available for bin)
-o,--output <outputfile> output filename. Default files: standard
output for txt and edoal, 'result' for html
and bin
-p1 <uriprefix1> prefix of classes that have to be considered
-p2 <uriprefix1> prefix of classes that have to be considered
-s <s> a support threshold between [0;1] for
properties (default:0)
-sparql if given the datasets are considered as sparql
endpoints
-t <eq or in> types of extracted keys: eq or/and in (eq and
in by default)
Example of command line:
java -mx5000m -jar LinkkeyDiscovery-1.0-SNAPSHOT-jar-with-dependencies.jar -s 0.01 -i -c 4 -c1 "http://xmlns.com/foaf/0.1/Person" -c2 "http://xmlns.com/foaf/0.1/Person" -t in -f html -o mycandidates dataset1.ttl dataset2.ttl
This will extract link key candidates between the classes foaf:Person
(options -c1 and -c2) of files dataset1.ttl and dataset2.ttl.
The extraction algorithm will extract intersection link key candidates (-t in).
It will consider only properties that are instanciated for at least 1% of instances of foaf:Person (-s 0.01).
It will consider inverse of properties and compostion of them until a maximum path length of 4.
The result will be rendered as a set of html files (-f html) located in directory "mycandidates" (-o mycandidates).
The option -mx5000m allows to give 5GB of memory to the virtual machine.