TODO

A small tutorial to the Alignment API

Here is a small tutorial for the alignment API. Since the API has no dedicated GUI, most of the tutorial is based on command-lines invocations. Of course, it is not the natural way to use this API: it is made for being embedded in some application programme and we are working towards implementing an alignment server that can help programmes to use the API remotely.

Preparation

For running the alignment API, you must have a Java interpreter available. We wil call it java.

Download the last version of the Alignement API from http://gforge.inria.fr/frs/?group_id=117. Unzip it and go to the created directory:

$ mkdir alignapi
$ cd alignapi
$ unzip align*.zip

You can check that everything works by only typing

$ java -jar lib/procalign.jar --help
usage: Procalign [options] URI1 URI2
options are:
        --impl=className -i classname           Use the given alignment implementation.
        --renderer=className -r className       Specifies the alignment renderer
        --output=filename -o filename   Output the alignment in filename
        --params=filename -p filename   Reads parameters from filename
        --alignment=filename -a filename Start from an XML alignment file
        --threshold=double -t double    Filters the similarities under threshold
        --cutmethod=hard|perc|prop|best|span -T hard|perc|prop|best|span        method for computing the threshold
        --debug[=n] -d [n]              Report debug info at level n
        -Dparam=value                   Set parameter
        --help -h                       Print this message

This should output the command line usage of the Procalign class. We do not detail it here, this tutorial will present it entirelly.

You will then go to the tutorial directory by doing

$ cd html/tutorial

The goal of this tutorial is only to help you realize the possibilities of the Alignment API and implementation. It can be played by invoking each command line from the command-line interpreter. In this example we use the tcsh syntax but the main specific syntax is the first one:

$ setenv CWD `pwd`
which puts in variable $CWD the name of the current directory.

Beside a Java interpreter, if one wants to generate the HTML translations of the alignements, this can be done with the help of an XSLT 1.0 processor like xsltproc. Hence:

$ xsltproc ../form-align.xsl file.rdf > file.html
generates file.html from the alignment file file.rdf.

The data

Your mission, if you accept it, will be to find the best alignment between two bibliographic ontologies. They can be seen here:

Aligning

Let's try to align these two ontologies:

$ java -jar ../../lib/procalign.jar file://localhost$CWD/edu.umbc.ebiquity.publication.owl file://localhost$CWD/edu.mit.visus.bibtex.owl

The result is displayed on the standard output in the Alignment format. This format, expressed in RDF/XML, is made of a header containing "metadata" about the alignment:

and the corresponding set of correspondences:
each correspondence is made of two references to the aligned entities, the relation holding between the entities (=) and a confidence measure (1.0) in this correspondence. Here, because the default method that has been used for aligning the ontologies is so simple (it only compares the labels of the entities and find that there is a correspondence if their labels are equal), the correspondences are always that simple. But it is too simple so we will use a more sophisticated method based on an edit distance:

$ java -jar ../../lib/procalign.jar file://localhost$CWD/edu.umbc.ebiquity.publication.owl file://localhost$CWD/edu.mit.visus.bibtex.owl -i fr.inrialpes.exmo.align.impl.method.StringDistAlignment -DstringFunction=levenshteinDistance

This is achieved by specifying the class of alignment to be used (through the -i switch) and the distance function to be used (-DstringFunction=levenshteinDistance).

Look at the results: how are they different from before?

We can see that the correspondences now contain confidence factors different than 1.0, they also match strings which are not the same and indeed far more correspondences are available.

Because this result is too long to analize, it will be sent to the file raw.rdf (though the -o flag).

$ java -jar ../../lib/procalign.jar file://localhost$CWD/edu.umbc.ebiquity.publication.owl file://localhost$CWD/edu.mit.visus.bibtex.owl -i fr.inrialpes.exmo.align.impl.method.StringDistAlignment -DstringFunction=levenshteinDistance -o raw.rdf
See the output in RDF/XML or HTML.

More work: you can apply other available alignments classes. Look in the directory for more simple alignment methods. Also look in the StringDistAlignment class the possible values for stringFunction.

Advanced: You can also look at the instructions for installing WordNet and its Java interface and use a WordNet based distance provided with the API implementation by:

$ java -jar ../../lib/alignwn.jar file://localhost$CWD/edu.umbc.ebiquity.publication.owl file://localhost$CWD/edu.mit.visus.bibtex.owl -i fr.inrialpes.exmo.align.ling.JWNLAlignment -o jwnl.rdf
See the output in
RDF/XML or HTML.

Manipulating

As can be seen there are some correspondences that do not really make sense. Fortunatelly, they also have very low confidence level. It is thus interesting to use a threshold for eliminating these values. Let's try a threshold of .5 over the alignment:

$ java -jar ../../lib/procalign.jar file://localhost$CWD/edu.umbc.ebiquity.publication.owl file://localhost$CWD/edu.mit.visus.bibtex.owl -i fr.inrialpes.exmo.align.impl.method.StringDistAlignment -DstringFunction=levenshteinDistance -t 0.5 -o dist5.rdf
See the output in RDF/XML or HTML.

As expected we have suppressed some of these

We can also apply this treatment to other methods available:

$ java -jar ../../lib/procalign.jar file://localhost$CWD/edu.umbc.ebiquity.publication.owl file://localhost$CWD/edu.mit.visus.bibtex.owl -i fr.inrialpes.exmo.align.impl.method.StringDistAlignment -DstringFunction=smoaDistance -t 0.5 -o dist2.rdf
See the output in RDF/XML or HTML.

Other manipulations: It is possible to invert an alignement with the following command:

$ java -cp ../../lib/procalign.jar fr.inrialpes.exmo.align.util.ParserPrinter -i dist.rdf -o tsid.rdf
See the output in RDF/XML or HTML. The results is an alignement from the source to the target.

More work:There is another switch (-T) in Procalign that specifies the way a threshold is applied (hard|perc|prop|best|span) the default being "hard". The curious reader can apply these and see the difference in results. How they work is explained in the Alignment API documentation.

More work: What is the best threshold for

Output

$ java -cp ../../lib/procalign.jar fr.inrialpes.exmo.align.util.ParserPrinter dist.rdf -r fr.inrialpes.exmo.align.impl.renderer.OWLAxiomsRendererVisitor
We can also generate SWRL rules:
$ java -cp ../../lib/procalign.jar fr.inrialpes.exmo.align.util.ParserPrinter dist.rdf -r fr.inrialpes.exmo.align.impl.renderer.SWRLRendererVisitor
or XLST transformations:
$ java -cp ../../lib/procalign.jar fr.inrialpes.exmo.align.util.ParserPrinter dist.rdf -r fr.inrialpes.exmo.align.impl.renderer.XSLTRendererVisitor -o dist.xsl
this transformation can be applied to the data of data.xml:
$ xsltproc dist.xsl data.xml > data2.xml
for giving the data2.xml file.

Evaluating

We will evaluate alignments by comparing them to some reference alignment which is supposed to express what is expected from an alignment of these two ontologies. The reference alignment is refalign.rdf (or HTML).

For evaluating we use another class than Procalign. It is called EvalAlign we should specify this to java. By default, it computes precision, recall and associated measures. It can be invoked this way:

$ java -cp ../../lib/procalign.jar fr.inrialpes.exmo.align.util.EvalAlign -i fr.inrialpes.exmo.align.impl.eval.PRecEvaluator file://localhost$CWD/refalign.rdf file://localhost$CWD/dist.rdf

Look at the results: what to expect from the evaluation of this alignment?

Since it returns more correspondences by loosening the constraints for being a correspondence, it is expected that the recall will increase at the expense of precision.

We can see the results

$ java -cp ../../lib/procalign.jar fr.inrialpes.exmo.align.util.EvalAlign -i fr.inrialpes.exmo.align.impl.eval.PRecEvaluator file://localhost$CWD/refalign.rdf file://localhost$CWD/dist2.rdf
$ java -jar ../../lib/Procalign.jar file://localhost$CWD/rdf/edu.umbc.ebiquity.publication.owl file://localhost$CWD/rdf/edu.mit.visus.bibtex.owl -i fr.inrialpes.exmo.align.impl.method.StringDistAlignment -DstringFunction=levenshteinDistance -DprintMatrix=1 -o /dev/null > matrix.tex
$ java -cp ../../lib/procalign.jar fr.inrialpes.exmo.align.util.EvalAlign -i fr.inrialpes.exmo.align.impl.eval.PRecEvaluator file://localhost$CWD/refalign.rdf file://localhost$CWD/jwnl.rdf

More work: Use F-measure

More work:If you want to compare several algorithms, there is another class, GroupAlign, that ...

Embedding

Of course, the goal of this API is not to be used at the command line level (even if it is very often very useful). So if you are ready for it, you can develop in Java your own application that take advantage of the API.

This should be possible to invoque it through:

$ java -cp ../../lib/Procalign.jar MyApp file://localhost$CWD/rdf/edu.umbc.ebiquity.publication.owl file://localhost$CWD/rdf/edu.mit.visus.bibtex.owl

More work: Can you add a switch like the -i switch of Procalign so that the main class of the application can be passed at commant-line.

you can develop a specialized matching algorithm by subclassing the Java programs provided in

More work: Implement the F-measure optimization rule presented above so that you automatically select the threshold that maximizes F-measure.

Further exercises

More info: http://alignapi.gforge.inria.fr

Planning:
- Alignment server (incl. DB storage, agents, WSDL service)
- Extensive composition operators (with comp. tables)
- Expressive alignment language (with SEKT/François Sharffe)

Acknowledgements

The format of this tutorial has been shamelessly borrowed from Sean Bechhofer's OWL tutorial.


http://alignapi.gforge.inria.fr/tutorial

$Id$