diff --git a/html/tutorial/index.html b/html/tutorial/index.html index 96e04a1990f0a173ab79b7dde1fcb17e1a78a73d..7ac39bf6ced571fb6c77c73a4766245e9a2bf6ac 100644 --- a/html/tutorial/index.html +++ b/html/tutorial/index.html @@ -2,640 +2,555 @@ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> - <head> - - <title>A small tutorial on the Alignment API</title> - <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> - <meta name="Contributor" content="Antoine Zimmermann" /> - <link rel="stylesheet" type="text/css" href="../base.css" /> - <link rel="stylesheet" type="text/css" href="../style.css" /> - <script type="text/javascript"> - <!-- - function show(id) { - var element = document.getElementById(id); - element.style.display = "block"; - } - function hide(id) { - var element = document.getElementById(id); - element.style.display = "none"; - } - --> - </script> - <style type="text/css"> - <!-- - div.logic { - padding-left: 5px; - padding-right: 5px; - margin-top: 10px; - margin-bottom: 10px; - } - --> - </style> - +<title>A small tutorial on the Alignment API</title> +<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> +<meta name="Contributor" content="Antoine Zimmermann" /> +<link rel="stylesheet" type="text/css" href="../base.css" /> +<link rel="stylesheet" type="text/css" href="../style.css" /> +<script type="text/javascript"> +<!-- +function show(id) { + var element = document.getElementById(id); + element.style.display = "block"; +} +function hide(id) { + var element = document.getElementById(id); + element.style.display = "none"; +} +--> +</script> +<style type="text/css"> +<!-- +div.logic { + padding-left: 5px; + padding-right: 5px; + margin-top: 10px; + margin-bottom: 10px; +} +--> +</style> </head> - <body style="background-color: #FFFFFF;"> - <h1>A small tutorial on the Alignment <abbr title="Application Programming Interface">API</abbr></h1> - - <dl> - <dt>This version:</dt> - <dd> - http://alignapi.gforge.inria.fr/tutorial/ - </dd> - <dt>Author:</dt> - <dd> - <a href="http://exmo.inrialpes.fr/people/euzenat">Jérôme Euzenat</a>, INRIA Rhône-Alpes - </dd> - </dl> - - <p style="border-bottom: 2px solid #AAAAAA; border-top: 2px solid #AAAAAA; padding-top: 15px; padding-bottom: 15px;"> - Here is a small tutorial for the alignment <abbr>API</abbr>. Since the <abbr>API</abbr> has no dedicated <abbr title="Graphical User Interface">GUI</abbr>, most of the tutorial is based on command-lines invocations. Of course, it is not the natural way to use this <abbr>API</abbr>: it is made for being embedded in some application programme and we are working towards implementing an alignment server that can help programmes to use the <abbr>API</abbr> remotely. The complete tutorial is also available as a self-contained <a href="script.sh" title="script for UNIX systems">script.sh</a> or <a href="script.bat" title="script for Windows systems">script.bat</a> - </p> - - <h2>Preparation</h2> - - <p> - For running the alignment <abbr>API</abbr>, you must have a Java interpreter available. We wil call it <tt>java</tt>. - </p><p> - Download the last version of the Alignment <abbr>API</abbr> from <a href="http://gforge.inria.fr/frs/?group_id=117">http://gforge.inria.fr/frs/?group_id=117</a>. Unzip it and go to the created directory:</p> - <div class="fragment"><pre> - $ mkdir alignapi - $ cd alignapi - $ unzip align*.zip - </pre></div> - <p> - You can check that everything works by only typing: - </p> - <div class="fragment"><pre> - $ java -jar lib/procalign.jar --help - </pre></div> - <div class="button"> - <button onclick="show('qu3')">Show output</button> - <button onclick="hide('qu3')">Hide output</button> - </div> - <div class="explain" id="qu3"><pre> - usage: Procalign [options] URI1 URI2 - options are: - --impl=className -i classname Use the given alignment implementation. - --renderer=className -r className Specifies the alignment renderer - --output=filename -o filename Output the alignment in filename - --params=filename -p filename Reads parameters from filename - --alignment=filename -a filename Start from an XML alignment file - --threshold=double -t double Filters the similarities under threshold - --cutmethod=hard|perc|prop|best|span -T hard|perc|prop|best|span method for computing the threshold - --debug[=n] -d [n] Report debug info at level n - -Dparam=value Set parameter - --help -h Print this message - </pre></div> - <p> - The above command outputs the command line usage of the Procalign class. We do not detail it here, this tutorial will present it entirelly. - </p><p> - You can <a href="../align.html">modify the Alignment <abbr>API</abbr> and its implementation</a>. In this tutorial, we will simply learn how to use it. - </p><p> - You will then go to the tutorial directory by doing: - </p> - <div class="fragment"><pre> - $ cd html/tutorial - </pre></div> - <p> - You can clean up previous trials by: - </p> - <div class="fragment"><pre> - $ rm results/ - </pre></div> - <p> - The goal of this tutorial is only to help you realize the possibilities of the Alignment <abbr>API</abbr> and implementation. It can be played by invoking each command line from the command-line interpreter. In this example we use the <tt>tcsh</tt> syntax but the main specific syntax is the first one:</p> - <div class="fragment"><pre> - $ setenv CWD `pwd` - </pre></div> - <p> - which puts in variable <tt>$CWD</tt> the name of the current directory. - </p><p> - Beside a Java interpreter, if one wants to generate the <abbr title="HyperText Markup Language">HTML</abbr> translations of the alignments, this can be done with the help of an <abbr title="XML Stylesheet Language Trasnformation">XSLT</abbr> 1.0 processor like <tt>xsltproc</tt>. Hence: - </p> - <div class="fragment"><pre> - $ xsltproc ../form-align.xsl results/file.rdf > results/file.html - </pre></div> - <p> - generates <tt>results/file.html</tt> from the alignment file <tt>results/file.rdf</tt>. - </p> - - <h2>The data</h2> - - <p> - Your mission, if you accept it, will be to find the best alignment between two bibliographic ontologies. They can be seen here: - </p> - <dl> - <dt>edu.mit.visus.bibtex.owl</dt> - <dd> - is a relatively faithfull transcription of BibTeX as an ontology. It can be seen here in <a href="edu.mit.visus.bibtex.owl"><abbr title="Ressource Description Framework">RDF</abbr>/<abbr title="eXtansible Markup Language">XML</abbr></a> or <a href="edu.mit.visus.bibtex.html"><abbr>HTML</abbr></a>. - </dd> - <dt>myOnto.owl</dt> - <dd> - is an extension of the previous one that contains a number of supplementary concepts. It can be seen here in <a href="myOnto.owl"><abbr>RDF</abbr>/<abbr>XML</abbr></a> or <a href="myOnto.html"><abbr>HTML</abbr></a>. - </dd> - </dl> - <p> - These two ontologies have been used for a few years in the <a href="oaei.ontologymatching.org">Ontology Alignment Evaluation Initiative</a>. - </p> - - <h2>Matching</h2> - - <p> - For demonstrating the use of our implementation of the Alignment <abbr>API</abbr>, we implemented a particular processor (<tt>fr.inrialpes.exmo.align.util.Procalign</tt>) which:</p> - <ul> - <li>Reads two <acronym title="Web Ontology Language">OWL</acronym>/<abbr>RDF</abbr> ontologies;</li> - <li>Creates an alignment object;</li> - <li>Computes the alignment between these ontologies;</li> - <li>Displays the result.</li> - </ul> - <p> - Let's try to match these two ontologies: - </p> - <div class="fragment"><pre> - $ java -jar ../../lib/procalign.jar file://localhost$CWD/myOnto.owl file://localhost$CWD/edu.mit.visus.bibtex.owl - </pre></div> - <p> - Additionaly a number of options are available: - </p> - <ul> - <li>displaying debug information (-d);</li> - <li>controling the way of rendering the output (-r);</li> - <li>deciding the implementation of the alignment method (-i);</li> - <li>providing an input alignment (-a) [implemented but not used by most methods].</li> - </ul> - <p> - The result is displayed on the standard output. Since the output is too long we send it to a file by using the <tt>-o</tt> flag: - </p> - <div class="fragment"><pre> - $ java -jar ../../lib/procalign.jar file://localhost$CWD/myOnto.owl file://localhost$CWD/edu.mit.visus.bibtex.owl -o results/equal.rdf - </pre></div> - <p> - See the output in <a href="results/equal.rdf"><abbr>RDF</abbr>/<abbr>XML</abbr></a> or <a href="results/equal.html"><abbr>HTML</abbr></a>. - </p><p> - The result is expressed in the Alignment format. This format, in <abbr>RDF</abbr>/<abbr>XML</abbr>, is made of a header containing "metadata" about the alignment: - </p> - <div class="owl"><pre> - <?xml version="1.0" encoding="utf-8" standalone="no"?> - <rdf:RDF xmlns="http://knowledgeweb.semanticweb.org/heterogeneity/alignment" - xml:base="http://knowledgeweb.semanticweb.org/heterogeneity/alignment" - xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" - xmlns:xsd="http://www.w3.org/2001/XMLSchema#"> - <Alignment> - <xml>yes</xml> - <level>0</level> - <type>**</type> - <time>66</time> - <owl:Class rdf:about="Techreport"> - <owl:equivalentClass rdf:resource="Techreport"/> - </owl:Class> - - <method>fr.inrialpes.exmo.align.impl.method.StringDistAlignment</method> - <onto1>file://localhost/JAVA/alignapi/html/tutorial/myOnto.owl</onto1> - <onto2>file://localhost/JAVA/alignapi/html/tutorial/edu.mit.visus.bibtex.owl</onto2> - <uri1>http://alignapi.gforge.inria.fr/tutorial/myOnto.owl</uri1> - <uri2>http://alignapi.gforge.inria.fr/tutorial/edu.mit.visus.bibtex.owl</uri2> - </pre></div> - <p> - and the corresponding set of correspondences: - </p> - <div class="owl"><pre> - <map> - <Cell> - <entity1 rdf:resource="http://alignapi.gforge.inria.fr/tutorial/myOnto.owl#Article"/> - <entity2 rdf:resource="http://alignapi.gforge.inria.fr/tutorial/edu.mit.visus.bibtex.owl#Article"/> - <measure rdf:datatype="http://www.w3.org/2001/XMLSchema#float">1.0</measure> - <relation>=</relation> - </Cell> - </map> - </pre></div> - <p> - each correspondence is made of two references to the aligned entities, the relation holding between the entities (<tt>=</tt>) and a confidence measure (<tt>1.0</tt>) in this correspondence. Here, because the default method that has been used for aligning the ontologies is so simple (it only compares the labels of the entities and find that there is a correspondence if their labels are equal), the correspondences are always that simple. But it is too simple so we will use a more sophisticated method based on an edit distance: - </p> - <div class="fragment"><pre> - $ java -jar ../../lib/procalign.jar -i fr.inrialpes.exmo.align.impl.method.StringDistAlignment -DstringFunction=levenshteinDistance file://localhost$CWD/myOnto.owl file://localhost$CWD/edu.mit.visus.bibtex.owl -o results/levenshtein.rdf - </pre></div> - <p> - See the output in <a href="results/levenshtein.rdf"><abbr>RDF</abbr>/<abbr>XML</abbr></a> or <a href="results/levenshtein.html"><abbr>HTML</abbr></a>. - </p><p> - This is achieved by specifying the class of Alignment to be used (through the <tt>-i</tt> switch) and the distance function to be used (<tt>-DstringFunction=levenshteinDistance</tt>). - </p><p> - Look at the results: how are they different from before? - </p> - <div class="button"> - <button onclick="show('qu1')">Show Discussion</button> - <button onclick="hide('qu1')">Hide Discussion</button> - </div> - <div class="explain" id="qu1"><p> - We can see that the correspondences now contain confidence factors different than <tt>1.0</tt>, they also match strings which are not the same and indeed far more correspondences are available. - </p></div> - <p> - We do the same with another measure (smoaDistance):</p> - <div class="fragment"><pre> - $ java -jar ../../lib/procalign.jar -i fr.inrialpes.exmo.align.impl.method.StringDistAlignment -DstringFunction=smoaDistance file://localhost$CWD/myOnto.owl file://localhost$CWD/edu.mit.visus.bibtex.owl -o results/SMOA.rdf - </pre></div> - <div class="logic"><p> - <b>More work:</b> you can apply other available alignments classes. Look in the <a href="../../src/fr/inrialpes/exmo/align/impl/method">../../src/fr/inrialpes/exmo/align/impl/method</a> directory for more simple alignment methods. Also look in the <tt>StringDistances</tt> class the possible values for <tt>stringFunction</tt> (they are the names of methods).</p> - </div><div class="logic"><p> - <b>Advanced:</b> You can also look at the instructions for installing WordNet and its Java interface and use a WordNet based distance provided with the <abbr>API</abbr> implementation by: - </p><div class="fragment"><pre> - $ java -jar ../../lib/alignwn.jar -i fr.inrialpes.exmo.align.ling.JWNLAlignment file://localhost$CWD/myOnto.owl file://localhost$CWD/edu.mit.visus.bibtex.owl -o results/jwnl.rdf - </pre></div><p> - See the output in <a href="jwnl.rdf"><abbr>RDF</abbr>/<abbr>XML</abbr></a> or <a href="jwnl.html"><abbr>HTML</abbr></a>. - </p> - </div> - - <h2>Manipulating</h2> - - <p> - As can be seen there are some correspondences that do not really make sense. Fortunately, they also have very low confidence values. It is thus interesting to use a threshold for eliminating these values. Let's try a threshold of <tt>.33</tt> over the alignment (with the <tt>-t</tt> switch): - </p> - <div class="fragment"><pre> - $ java -jar ../../lib/procalign.jar file://localhost$CWD/myOnto.owl file://localhost$CWD/edu.mit.visus.bibtex.owl -i fr.inrialpes.exmo.align.impl.method.StringDistAlignment -DstringFunction=levenshteinDistance -t 0.33 -o results/levenshtein33.rdf - </pre></div> - <p> - See the output in <a href="results/levenshtein33.rdf"><abbr>RDF</abbr>/<abbr>XML</abbr></a> or <a href="results/levenshtein33.html"><abbr>HTML</abbr></a>. - </p><p> - As expected we have suppressed some of these inaccurate correspondences. But did we also suppressed accurate ones? - </p> - <div class="button"> - <button onclick="show('qu4')">Show Discussion</button> - <button onclick="hide('qu4')">Hide Discussion</button> - </div> - <div class="explain" id="qu4"><p> - This operation has contributed eliminating a number of innacurate correspondences like Journal-Conference or Composite-Conference. However, there remains some unaccurate correspondences like Institution-InCollection and Published-UnPublished! - </p></div> - <p> - We can also apply this treatment to other methods available: - </p> - <div class="fragment"><pre> - $ java -jar ../../lib/procalign.jar -i fr.inrialpes.exmo.align.impl.method.StringDistAlignment -DstringFunction=smoaDistance file://localhost$CWD/myOnto.owl file://localhost$CWD/edu.mit.visus.bibtex.owl -t 0.5 -o results/SMOA5.rdf - </pre></div> - <p> - See the output in <a href="results/SMOA5.rdf"><abbr>RDF</abbr>/<abbr>XML</abbr></a> or <a href="results/SMOA5.html"><abbr>HTML</abbr></a>. - </p><p> - <b>Other manipulations:</b> It is possible to invert an alignment with the following command: - </p> - <div class="fragment"><pre> - $ java -cp ../../lib/procalign.jar fr.inrialpes.exmo.align.util.ParserPrinter -i results/SMOA5.rdf -o results/AOMS5.rdf - </pre></div> - <p> - See the output in <a href="results/AOMS5.rdf"><abbr>RDF</abbr>/<abbr>XML</abbr></a> or <a href="results/AOMS5.html"><abbr>HTML</abbr></a>. The results is an alignment from the source to the target. Inverting alignment is only the exchange of the order of the elements in the alignment file. This can be useful when you have an alignment of <i>A</i> to <i>B</i>, an alignment from <i>C</i> to <i>B</i> and you want to go from <i>A</i> to <i>C</i>. The solution is then to invert the second alignment and to compose them. - </p> - <div class="logic"><p> - <b>More work:</b> There is another switch (<tt>-T</tt>) in Procalign that specifies the way a threshold is applied (hard|perc|prop|best|span) the default being "hard". The curious reader can apply these and see the difference in results. How they work is explained in the Alignment <abbr>API</abbr> documentation. - </p></div> - <div class="logic"><p> - <b>More work:</b> Try to play with the thresholds in order to find the best values for levenshteinDistance and smoaDistance. - </p></div> - - <h2>Output</h2> - - <p> - Once a good alignment has been found, only half of the work has been done. In order to actually use our result it is necessary to transform it into some processable format. For instance, if one wants to merge two OWL ontologies, the alignment can be changed into as set of <acronym>OWL</acronym> "bridging" axioms. This is achieved by "rendering" the alignment in <acronym>OWL</acronym> (through the <tt>-r</tt> switch): - </p> - <div class="fragment"><pre> - $ java -cp ../../lib/procalign.jar fr.inrialpes.exmo.align.util.ParserPrinter results/SMOA5.rdf -r fr.inrialpes.exmo.align.impl.renderer.OWLAxiomsRendererVisitor - </pre></div> - <p> - The result is a set of OWL assertions of the form: - </p> - <div class="owl"><pre> - <owl:Class rdf:about="http://alignapi.gforge.inria.fr/tutorial/myOnto.owl#Techreport"> - <owl:equivalentClass rdf:resource="http://alignapi.gforge.inria.fr/tutorial/edu.mit.visus.bibtex.owl#Techreport"/> - </owl:Class> - - <owl:ObjectProperty rdf:about="http://alignapi.gforge.inria.fr/tutorial/myOnto.owl#copyright"> - <owl:equivalentProperty rdf:resource="http://alignapi.gforge.inria.fr/tutorial/edu.mit.visus.bibtex.owl#hasCopyright"/> - </owl:ObjectProperty> - </pre></div> - <p> - If one wants to use the alignments only for infering on instances without actually merging the classes, she can generate SWRL rules: - </p> - <div class="fragment"><pre> - $ java -cp ../../lib/procalign.jar fr.inrialpes.exmo.align.util.ParserPrinter results/SMOA5.rdf -r fr.inrialpes.exmo.align.impl.renderer.SWRLRendererVisitor - </pre></div> - <p> - which brings for the same assertions: - </p> - <div class="owl"><pre> - <ruleml:imp> - <ruleml:_body> - <swrl:classAtom> - <owllx:Class owllx:name="http://alignapi.gforge.inria.fr/tutorial/myOnto.owl#Techreport"/> - <ruleml:var>x</ruleml:var> - </swrl:classAtom> - </ruleml:_body> - <ruleml:_head> - <swrlx:classAtom> - <owllx:Class owllx:name="http://alignapi.gforge.inria.fr/tutorial/edu.mit.visus.bibtex.owl#Techreport"/> - <ruleml:var>x</ruleml:var> - </swrl:classAtom> - </ruleml:_head> - </ruleml:imp> - - <ruleml:imp> - <ruleml:_body> - <swrl:individualPropertyAtom swrlx:property="http://alignapi.gforge.inria.fr/tutorial/myOnto.owl#copyright"/> - <ruleml:var>x</ruleml:var> - <ruleml:var>y</ruleml:var> - </swrl:individualPropertyAtom> - </ruleml:_body> - <ruleml:_head> - <swrl:datavaluedPropertyAtom swrlx:property="http://alignapi.gforge.inria.fr/tutorial/edu.mit.visus.bibtex.owl#hasCopyright"/> - <ruleml:var>x</ruleml:var> - <ruleml:var>y</ruleml:var> - </swrl:datavaluedPropertyAtom> - </ruleml:_head> - </ruleml:imp> - </pre></div> - - <p> - Exchanging data can also be achieved more simply through <abbr>XLST</abbr> transformations which will transform the <acronym>OWL</acronym> instance files from one ontology to another: - </p> - <div class="fragment"><pre> - $ java -cp ../../lib/procalign.jar fr.inrialpes.exmo.align.util.ParserPrinter results/SMOA5.rdf -r fr.inrialpes.exmo.align.impl.renderer.XSLTRendererVisitor -o results/SMOA5.xsl - </pre></div> - <p> - this transformation can be applied to the data of <a href="data.xml">data.xml</a>: - </p> - <div class="fragment"><pre> - $ xsltproc results/SMOA5.xsl data.xml > results/data.xml - </pre></div> - <p> - for giving the <a href="results/data.xml">results/data.xml</a> file. - </p> - - <h2>Evaluating</h2> - - <p> - We will evaluate alignments by comparing them to some reference alignment which is supposed to express what is expected from an alignment of these two ontologies. The reference alignment is <a href="refalign.rdf">refalign.rdf</a> (or <a href="results/refalign.html"><abbr>HTML</abbr></a>). - </p><p> - For evaluating we use another class than <tt>Procalign</tt>. It is called <tt>EvalAlign</tt> we should specify this to <tt>java</tt>. By default, it computes precision, recall and associated measures. It can be invoked this way: - </p> - <div class="fragment"><pre> - $ java -cp ../../lib/procalign.jar fr.inrialpes.exmo.align.util.EvalAlign -i fr.inrialpes.exmo.align.impl.eval.PRecEvaluator file://localhost$CWD/refalign.rdf file://localhost$CWD/results/equal.rdf - </pre></div> - <p> - The first argument is always the reference alignment, the second one is the alignment to be evaluated. The result is given here: - </p> - <div class="owl"><pre> - <?xml version="1.0" encoding="utf-8" standalone="yes"?> - <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" - xmlns:map="http://www.atl.external.lmco.com/projects/ontology/ResultsOntology.n3#"> - <map:output rdf:about=""> - <map:reference rdf:resource="file://localhost/JAVA/alignapi/html/tutorial/results/refalign.rdf"> - <map:input rdf:resource="file://localhost/JAVA/alignapi/html/tutorial/results/equal.rdf"> - <map:precision>1.0</map:precision> - <map:recall>0.22916666666666666</map:recall> - <fallout>0.0</fallout> - <map:fMeasure>0.37288135593220334</map:fMeasure> - <map:oMeasure>0.22916666666666666</map:oMeasure> - <result>0.22916666666666666</result> - </map:output> - </rdf:RDF> - </pre></div> - <p> - Of course, since that method only match objects with the same name, it is accurate, yielding a high precision. However, it has poor recall. - </p><p> - We can now evaluate the edit distance. What to expect from the evaluation of this alignment? - </p> - <div class="button"> - <button onclick="show('qu5')">Show Discussion</button> - <button onclick="hide('qu5')">Hide Discussion</button> - </div> - <div class="explain" id="qu5"><p> - Since it returns more correspondences by loosening the constraints for being a correspondence, it is expected that the recall will increase at the expense of precision. - </p></div> - <p> - We can see the results of: - </p> - <div class="fragment"><pre> - $ java -cp ../../lib/procalign.jar fr.inrialpes.exmo.align.util.EvalAlign -i fr.inrialpes.exmo.align.impl.eval.PRecEvaluator file://localhost$CWD/refalign.rdf file://localhost$CWD/results/levenshtein33.rdf - </pre></div> - <div class="button"> - <button onclick="show('qu6')">Show result</button> - <button onclick="hide('qu6')">Hide result</button> - </div> - <div class="explain" id="qu6"><pre> - <?xml version="1.0" encoding="utf-8" standalone="yes"?> - <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" - xmlns:map="http://www.atl.external.lmco.com/projects/ontology/ResultsOntology.n3#"> - <map:output rdf:about=""> - <map:reference rdf:resource="file://localhost/JAVA/alignapi/html/tutorial/results/refalign.rdf"> - <map:input rdf:resource="file://localhost/JAVA/alignapi/html/tutorial/results/levenshtein33.rdf"> - <map:precision>0.6811594202898551</map:precision> - <map:recall>0.9791666666666666</map:recall> - <fallout>0.3188405797101449</fallout> - <map:fMeasure>0.8034188034188035</map:fMeasure> - <map:oMeasure>0.5208333333333333</map:oMeasure> - <result>1.4374999999999998</result> - </map:output> - </rdf:RDF> - </pre></div> - <p> - It is possible to summarize these results by comparing them to each others. This can be achieved by the <tt>GroupEval</tt> class. This class can output several formats (by default html) and takes all the alignments in the subdirectories of the current directory. Here we only have the <tt>results</tt> directory: - </p> - <div class="fragment"><pre> - $ cp refalign.rdf results - $ java -cp ../../lib/procalign.jar fr.inrialpes.exmo.align.util.GroupEval -r refalign.rdf -l "refalign,equal,SMOA,SMOA5,levenshtein,levenshtein33" -c prf -o results/eval.html - </pre></div> - <p> - The results are displayed in the <a href="results/eval.html">results/eval.html</a> file whose main content is the table: - </p> - <table border="2" frame="box" rules="groups"> - <colgroup /> - <colgroup span="2" /> - <colgroup span="2" /> - <colgroup span="2" /> - <colgroup span="2" /> - <colgroup span="2" /> - <colgroup span="2" /> - <thead valign="top"><tr> - <th>algo</th> - <th colspan="2">refalign</th> - <th colspan="2">equal</th> - <th colspan="2">SMOA</th> - <th colspan="2">SMOA5</th> - <th colspan="2">levenshtein</th> - <th colspan="2">levenshtein33</th> - </tr></thead><tbody><tr> - <td>test</td> - <td>Prec.</td> - <td>Rec.</td> - <td>Prec.</td> - <td>Rec.</td> - <td>Prec.</td> - <td>Rec.</td> - <td>Prec.</td> - <td>Rec.</td> - <td>Prec.</td> - <td>Rec.</td> - <td>Prec.</td> - <td>Rec.</td> - </tr></tbody><tbody><tr> - <td>results</td> - <td>1.00</td> - <td>1.00</td> - <td>1.00</td> - <td>0.23</td> - <td>0.56</td> - <td>0.98</td> - <td>0.69</td> - <td>0.96</td> - <td>0.53</td> - <td>1.00</td> - <td>0.68</td> - <td>0.98</td> - </tr><tr style="background-color: #FFFF00;"> - <td>H-mean</td><td>1.00</td> - <td>1.00</td> - <td>1.00</td> - <td>0.23</td> - <td>0.56</td> - <td>0.98</td> - <td>0.69</td> - <td>0.96</td> - <td>0.53</td> - <td>1.00</td> - <td>0.68</td> - <td>0.98</td> - </tr></tbody> - </table> +<h1>A small tutorial on the Alignment <abbr title="Application Programming Interface">API</abbr></h1> + +<dl> +<dt>This version:</dt> +<dd>http://alignapi.gforge.inria.fr/tutorial/</dd> +<dt>Author:</dt> +<dd><a href="http://exmo.inrialpes.fr/people/euzenat">Jérôme Euzenat</a>, INRIA Rhône-Alpes +</dd> +</dl> + +<p style="border-bottom: 2px solid #AAAAAA; border-top: 2px solid #AAAAAA; padding-top: 15px; padding-bottom: 15px;">Here is a small tutorial for the alignment <abbr>API</abbr>. Since the <abbr>API</abbr> has no dedicated <abbr title="Graphical User Interface">GUI</abbr>, most of the tutorial is based on command-lines invocations. Of course, it is not the natural way to use this <abbr>API</abbr>: it is made for being embedded in some application programme and we are working towards implementing an alignment server that can help programmes to use the <abbr>API</abbr> remotely. The complete tutorial is also available as a self-contained <a href="script.sh" title="script for UNIX systems">script.sh</a> or <a href="script.bat" title="script for Windows systems">script.bat</a> +</p> - <!--div class="fragment"><pre> - $ java -jar ../../lib/Procalign.jar file://localhost$CWD/rdf/edu.umbc.ebiquity.publication.owl file://localhost$CWD/rdf/edu.mit.visus.bibtex.owl -i fr.inrialpes.exmo.align.impl.method.StringDistAlignment -DstringFunction=levenshteinDistance -DprintMatrix=1 -o /dev/null > matrix.tex - </pre></div--> +<h2>Preparation</h2> - <div class="logic"><p> - <b>More work:</b> As you can see, the <tt>PRecEvaluator</tt> does not only provide precision and recall but also provides F-measure. F-measure is usually used as an "absolute" trade-off between precision and recall (i.e., the optimum F-measure is considered the best precision and recall). Can you establish this point for <acronym>SMOA</acronym> and levenshtein and tell which algorithm is more adapted? - </p></div> +<p>For running the alignment <abbr>API</abbr>, you must have a Java + interpreter available. We wil call it <tt>java</tt>.</p> + +<p>Download the last version of the Alignment <abbr>API</abbr> from <a href="http://gforge.inria.fr/frs/?group_id=117">http://gforge.inria.fr/frs/?group_id=117</a>. Unzip it and go to the created directory:</p> +<div class="fragment"><pre> +$ mkdir alignapi +$ cd alignapi +$ unzip align*.zip +</pre></div> + +<p>You can check that everything works by only typing:</p> +<div class="fragment"><pre> +$ java -jar lib/procalign.jar --help +</pre></div> +<!--div class="button"><form><input type="button" value="Show +output" onclick="show('qu3')"/><input type="button" value="Hide +output" onclick="hide('qu3')"/></form></div--> + +<div class="button"> + <input type="button" onclick="show('qu3')" value="Show output"/> + <input type="button" onclick="hide('qu3')" value="Hide output"/> +</div> +<div class="explain" id="qu3"><pre> +usage: Procalign [options] URI1 URI2 +options are: + --impl=className -i classname Use the given alignment implementation. + --renderer=className -r className Specifies the alignment renderer + --output=filename -o filename Output the alignment in filename + --params=filename -p filename Reads parameters from filename + --alignment=filename -a filename Start from an XML alignment file + --threshold=double -t double Filters the similarities under threshold + --cutmethod=hard|perc|prop|best|span -T hard|perc|prop|best|span method for computing the threshold + --debug[=n] -d [n] Report debug info at level n + -Dparam=value Set parameter + --help -h Print this message +</pre></div> + +<p>The above command outputs the command line usage of the Procalign class. We do not detail it here, this tutorial will present it entirelly.</p> + +<p>You can <a href="../align.html">modify the Alignment <abbr>API</abbr> and its implementation</a>. In this tutorial, we will simply learn how to use it.</p> + +<p>You will then go to the tutorial directory by doing:</p> +<div class="fragment"><pre> +$ cd html/tutorial +</pre></div> +<p>You can clean up previous trials by:</p> +<div class="fragment"><pre> +$ rm results/ +</pre></div> + +<p>The goal of this tutorial is only to help you realize the possibilities of the Alignment <abbr>API</abbr> and implementation. It can be played by invoking each command line from the command-line interpreter. In this example we use the <tt>tcsh</tt> syntax but the main specific syntax is the first one:</p> +<div class="fragment"><pre> +$ setenv CWD `pwd` +</pre></div> +<p>which puts in variable <tt>$CWD</tt> the name of the current directory.</p> + +<p>Beside a Java interpreter, if one wants to generate the <abbr title="HyperText Markup Language">HTML</abbr> translations of the alignments, this can be done with the help of an <abbr title="XML Stylesheet Language Trasnformation">XSLT</abbr> 1.0 processor like <tt>xsltproc</tt>. Hence:</p> +<div class="fragment"><pre> +$ xsltproc ../form-align.xsl results/file.rdf > results/file.html +</pre></div> +<p>generates <tt>results/file.html</tt> from the alignment file <tt>results/file.rdf</tt>.</p> + +<h2>The data</h2> + +<p>Your mission, if you accept it, will be to find the best alignment between two bibliographic ontologies. They can be seen here:</p> +<dl> + <dt>edu.mit.visus.bibtex.owl</dt> + <dd>is a relatively faithfull transcription of BibTeX as an ontology. It can be seen here in <a href="edu.mit.visus.bibtex.owl"><abbr title="Ressource Description Framework">RDF</abbr>/<abbr title="eXtansible Markup Language">XML</abbr></a> or <a href="edu.mit.visus.bibtex.html"><abbr>HTML</abbr></a>.</dd> + <dt>myOnto.owl</dt> + <dd>is an extension of the previous one that contains a number of supplementary concepts. It can be seen here in <a href="myOnto.owl"><abbr>RDF</abbr>/<abbr>XML</abbr></a> or <a href="myOnto.html"><abbr>HTML</abbr></a>.</dd> +</dl> + +<p>These two ontologies have been used for a few years in the <a href="oaei.ontologymatching.org">Ontology Alignment Evaluation Initiative</a>.</p> - <!--div class="logic"><p><b>More work:</b>If you want to compare several algorithms, there - is another class, GroupAlign, that allows to run an - </p></div--> +<h2>Matching</h2> - <h2>Embedding</h2> +<p>For demonstrating the use of our implementation of the Alignment <abbr>API</abbr>, we implemented a particular processor (<tt>fr.inrialpes.exmo.align.util.Procalign</tt>) which:</p> +<ul> +<li>Reads two <acronym title="Web Ontology Language">OWL</acronym>/<abbr>RDF</abbr> ontologies;</li> + <li>Creates an alignment object;</li> + <li>Computes the alignment between these ontologies;</li> + <li>Displays the result.</li> +</ul> + +<p>Let's try to match these two ontologies:</p> +<div class="fragment"><pre> +$ java -jar ../../lib/procalign.jar file://localhost$CWD/myOnto.owl file://localhost$CWD/edu.mit.visus.bibtex.owl +</pre></div> + +<p>Additionaly a number of options are available:</p> +<ul> +<li>displaying debug information (-d);</li> +<li>controling the way of rendering the output (-r);</li> +<li>deciding the implementation of the alignment method (-i);</li> +<li>providing an input alignment (-a) [implemented but not used by most methods].</li> +</ul> + +<p>The result is displayed on the standard output. Since the output is too long we send it to a file by using the <tt>-o</tt> flag:</p> +<div class="fragment"><pre> +$ java -jar ../../lib/procalign.jar file://localhost$CWD/myOnto.owl file://localhost$CWD/edu.mit.visus.bibtex.owl -o results/equal.rdf +</pre></div> + +<p>See the output in <a href="results/equal.rdf"><abbr>RDF</abbr>/<abbr>XML</abbr></a> or <a href="results/equal.html"><abbr>HTML</abbr></a>.</p> +<p>The result is expressed in the Alignment format. This format, in <abbr>RDF</abbr>/<abbr>XML</abbr>, is made of a header containing "metadata" about the alignment: +</p> +<div class="owl"><pre> +<?xml version="1.0" encoding="utf-8" standalone="no"?> +<rdf:RDF xmlns="http://knowledgeweb.semanticweb.org/heterogeneity/alignment" + xml:base="http://knowledgeweb.semanticweb.org/heterogeneity/alignment" + xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" + xmlns:xsd="http://www.w3.org/2001/XMLSchema#"> +<Alignment> + <xml>yes</xml> + <level>0</level> + <type>**</type> + <time>66</time> + <owl:Class rdf:about="Techreport"> + <owl:equivalentClass rdf:resource="Techreport"/> + </owl:Class> + + <method>fr.inrialpes.exmo.align.impl.method.StringDistAlignment</method> + <onto1>file://localhost/JAVA/alignapi/html/tutorial/myOnto.owl</onto1> + <onto2>file://localhost/JAVA/alignapi/html/tutorial/edu.mit.visus.bibtex.owl</onto2> + <uri1>http://alignapi.gforge.inria.fr/tutorial/myOnto.owl</uri1> + <uri2>http://alignapi.gforge.inria.fr/tutorial/edu.mit.visus.bibtex.owl</uri2> +</pre></div> +<p>and the corresponding set of correspondences:</p> +<div class="owl"><pre> + <map> + <Cell> + <entity1 rdf:resource="http://alignapi.gforge.inria.fr/tutorial/myOnto.owl#Article"/> + <entity2 rdf:resource="http://alignapi.gforge.inria.fr/tutorial/edu.mit.visus.bibtex.owl#Article"/> + <measure rdf:datatype="http://www.w3.org/2001/XMLSchema#float">1.0</measure> + <relation>=</relation> + </Cell> + </map> +</pre></div> +<p>each correspondence is made of two references to the aligned entities, the relation holding between the entities (<tt>=</tt>) and a confidence measure (<tt>1.0</tt>) in this correspondence. Here, because the default method that has been used for aligning the ontologies is so simple (it only compares the labels of the entities and find that there is a correspondence if their labels are equal), the correspondences are always that simple. But it is too simple so we will use a more sophisticated method based on an edit distance:</p> +<div class="fragment"><pre> +$ java -jar ../../lib/procalign.jar -i fr.inrialpes.exmo.align.impl.method.StringDistAlignment -DstringFunction=levenshteinDistance file://localhost$CWD/myOnto.owl file://localhost$CWD/edu.mit.visus.bibtex.owl -o results/levenshtein.rdf +</pre></div> + +<p>See the output in <a href="results/levenshtein.rdf"><abbr>RDF</abbr>/<abbr>XML</abbr></a> or <a href="results/levenshtein.html"><abbr>HTML</abbr></a>.</p> + +<p>This is achieved by specifying the class of Alignment to be used (through the <tt>-i</tt> switch) and the distance function to be used (<tt>-DstringFunction=levenshteinDistance</tt>).</p> + +<p>Look at the results: how are they different from before?</p> +<div class="button"> + <input type="button" onclick="show('qu1')" value="Show Discussion"/> + <input type="button" onclick="hide('qu1')" value="Hide Discussion"/> +</div> +<div class="explain" id="qu1"> + +<p>We can see that the correspondences now contain confidence factors different than <tt>1.0</tt>, they also match strings which are not the same and indeed far more correspondences are available.</p></div> + +<p>We do the same with another measure (smoaDistance):</p> +<div class="fragment"><pre> +$ java -jar ../../lib/procalign.jar -i fr.inrialpes.exmo.align.impl.method.StringDistAlignment -DstringFunction=smoaDistance file://localhost$CWD/myOnto.owl file://localhost$CWD/edu.mit.visus.bibtex.owl -o results/SMOA.rdf +</pre></div> +<div class="logic"><p><b>More work:</b> you can apply other available alignments classes. Look in the <a href="../../src/fr/inrialpes/exmo/align/impl/method">../../src/fr/inrialpes/exmo/align/impl/method</a> directory for more simple alignment methods. Also look in the <tt>StringDistances</tt> class the possible values for <tt>stringFunction</tt> (they are the names of methods).</p></div> +<div class="logic"><p><b>Advanced:</b> You can also look at the instructions for installing WordNet and its Java interface and use a WordNet based distance provided with the <abbr>API</abbr> implementation by:</p> +<div class="fragment"><pre> +$ java -jar ../../lib/alignwn.jar -i fr.inrialpes.exmo.align.ling.JWNLAlignment file://localhost$CWD/myOnto.owl file://localhost$CWD/edu.mit.visus.bibtex.owl -o results/jwnl.rdf +</pre></div> + +<p>See the output in <a href="jwnl.rdf"><abbr>RDF</abbr>/<abbr>XML</abbr></a> or <a href="jwnl.html"><abbr>HTML</abbr></a>.</p></div> - <p> - Of course, the goal of this <abbr>API</abbr> is not to be used at the command line level (even if it can be very useful). So if you are ready for it, you can develop in Java your own application that takes advantage of the <abbr>API</abbr>. - </p><p> - A skeleton of program using the Alignment <abbr>API</abbr> is <a href="Skeleton.java">Skeleton.java</a>. It can be compiled by invoking: - </p> - <div class="fragment"><pre> - $ javac -classpath ../../lib/api.jar:../../lib/rdfparser.jar:../../lib/align.jar:../../lib/procalign.jar -d results Skeleton.java - </pre></div> - <p> - and run by: - </p> - <div class="fragment"><pre> - $ java -cp ../../lib/Procalign.jar:results Skeleton myOnto.owl edu.mit.visus.bibtex.owl - </pre></div> - <p> - Now considering the <abbr>API</abbr> (that can be consulted through its thin <a href="../../javadoc/org/semanticweb/owl/align/Alignment.html">Javadoc</a> for instance), can you modify the Skeleton program in order for it performs the following: - </p> - <ul> - <li>Run two different alignment methods (e.g., ngram distance and smoa);</li> - <li>Merge the two results;</li> - <li>Trim at various thresholds;</li> - <li>Evaluate them against the reference alignment and choose the one with the best F-Measure;</li> - <li>Displays it as <acronym title="Semantic Web Rule Language">SWRL</acronym> Rules.</li> - </ul> - <p>Of course, you can do it progressively.</p> - <div class="fragment"><pre> - $ javac -classpath ../../lib/api.jar:../../lib/rdfparser.jar:../../lib/align.jar:../../lib/procalign.jar -d results MyApp.java - $ java -cp ../../lib/Procalign.jar:results MyApp myOnto.owl edu.mit.visus.bibtex.owl > results/MyApp.owl - </pre></div> - <p> - Do you want to see a possible solution? - </p> - <div class="button"> - <button onclick="show('qu7')">Cheat</button> - <button onclick="hide('qu7')">Teacher is comming</button> - </div> - <div class="explain" id="qu7"><p> - The main piece of code in Skeleton.java is replaced by: - </p><pre> - // Run two different alignment methods (e.g., ngram distance and smoa) - AlignmentProcess a1 = new StringDistAlignment( onto1, onto2 ); - params.setParameter("stringFunction","smoaDistance"); - a1.align( (Alignment)null, params ); - AlignmentProcess a2 = new StringDistAlignment( onto1, onto2 ); - params = new BasicParameters(); - params.setParameter("stringFunction","ngramDistance"); - a1.align( (Alignment)null, params ); +<h2>Manipulating</h2> - // Merge the two results. - ((BasicAlignment)a1).ingest(a2); +<p>As can be seen there are some correspondences that do not really make sense. Fortunately, they also have very low confidence values. It is thus interesting to use a threshold for eliminating these values. Let's try a threshold of <tt>.33</tt> over the alignment (with the <tt>-t</tt> switch):</p> +<div class="fragment"><pre> +$ java -jar ../../lib/procalign.jar file://localhost$CWD/myOnto.owl file://localhost$CWD/edu.mit.visus.bibtex.owl -i fr.inrialpes.exmo.align.impl.method.StringDistAlignment -DstringFunction=levenshteinDistance -t 0.33 -o results/levenshtein33.rdf +</pre></div> + +<p>See the output in <a href="results/levenshtein33.rdf"><abbr>RDF</abbr>/<abbr>XML</abbr></a> or <a href="results/levenshtein33.html"><abbr>HTML</abbr></a>.</p> + +<p>As expected we have suppressed some of these inaccurate correspondences. But did we also suppressed accurate ones?</p> +<div class="button"> + <input type="button" onclick="show('qu4')" value="Show Discussion"/> + <input type="button" onclick="hide('qu4')" value="Hide Discussion"/> +</div> +<div class="explain" id="qu4"><p>This operation has contributed eliminating a number of innacurate correspondences like Journal-Conference or Composite-Conference. However, there remains some unaccurate correspondences like Institution-InCollection and Published-UnPublished!</p></div> + +<p>We can also apply this treatment to other methods available:</p> +<div class="fragment"><pre> +$ java -jar ../../lib/procalign.jar -i fr.inrialpes.exmo.align.impl.method.StringDistAlignment -DstringFunction=smoaDistance file://localhost$CWD/myOnto.owl file://localhost$CWD/edu.mit.visus.bibtex.owl -t 0.5 -o results/SMOA5.rdf +</pre></div> + +<p>See the output in <a href="results/SMOA5.rdf"><abbr>RDF</abbr>/<abbr>XML</abbr></a> or <a href="results/SMOA5.html"><abbr>HTML</abbr></a>.</p> + +<p><b>Other manipulations:</b> It is possible to invert an alignment with the following command:</p> +<div class="fragment"><pre> +$ java -cp ../../lib/procalign.jar fr.inrialpes.exmo.align.util.ParserPrinter -i results/SMOA5.rdf -o results/AOMS5.rdf +</pre></div> +<p>See the output in <a href="results/AOMS5.rdf"><abbr>RDF</abbr>/<abbr>XML</abbr></a> or <a href="results/AOMS5.html"><abbr>HTML</abbr></a>. The results is an alignment from the source to the target. Inverting alignment is only the exchange of the order of the elements in the alignment file. This can be useful when you have an alignment of <i>A</i> to <i>B</i>, an alignment from <i>C</i> to <i>B</i> and you want to go from <i>A</i> to <i>C</i>. The solution is then to invert the second alignment and to compose them.</p> + +<div class="logic"><p><b>More work:</b> There is another switch (<tt>-T</tt>) in Procalign that specifies the way a threshold is applied (hard|perc|prop|best|span) the default being "hard". The curious reader can apply these and see the difference in results. How they work is explained in the Alignment <abbr>API</abbr> documentation.</p></div> + +<div class="logic"><p><b>More work:</b> Try to play with the thresholds in order to find the best values for levenshteinDistance and smoaDistance.</p></div> - // Threshold at various thresholds - // Evaluate them against the references - // and choose the one with the best F-Measure - AlignmentParser aparser = new AlignmentParser(0); - Alignment reference = aparser.parse( "file://localhost"+(new File ( "refalign.rdf" ) . getAbsolutePath()), loaded ); - Evaluator evaluator = new PRecEvaluator( reference, a1 ); +<h2>Output</h2> - double best = 0.; - Alignment result = null; - for ( int i = 0; i <= 10 ; i = i+2 ){ - a1.cut( ((double)i)/10 ); - evaluator.eval( new BasicParameters() ); - System.err.println("Threshold "+(((double)i)/10)+" : "+((PRecEvaluator)evaluator).getFmeasure()); - if ( ((PRecEvaluator)evaluator).getFmeasure() > best ) { - result = (BasicAlignment)((BasicAlignment)a1).clone(); - best = ((PRecEvaluator)evaluator).getFmeasure(); - } +<p>Once a good alignment has been found, only half of the work has been done. In order to actually use our result it is necessary to transform it into some processable format. For instance, if one wants to merge two OWL ontologies, the alignment can be changed into as set of <acronym>OWL</acronym> "bridging" axioms. This is achieved by "rendering" the alignment in <acronym>OWL</acronym> (through the <tt>-r</tt> switch):</p> +<div class="fragment"><pre> +$ java -cp ../../lib/procalign.jar fr.inrialpes.exmo.align.util.ParserPrinter results/SMOA5.rdf -r fr.inrialpes.exmo.align.impl.renderer.OWLAxiomsRendererVisitor +</pre></div> + +<p>The result is a set of OWL assertions of the form:</p> +<div class="owl"><pre> + <owl:Class rdf:about="http://alignapi.gforge.inria.fr/tutorial/myOnto.owl#Techreport"> + <owl:equivalentClass rdf:resource="http://alignapi.gforge.inria.fr/tutorial/edu.mit.visus.bibtex.owl#Techreport"/> + </owl:Class> + + <owl:ObjectProperty rdf:about="http://alignapi.gforge.inria.fr/tutorial/myOnto.owl#copyright"> + <owl:equivalentProperty rdf:resource="http://alignapi.gforge.inria.fr/tutorial/edu.mit.visus.bibtex.owl#hasCopyright"/> + </owl:ObjectProperty> +</pre></div> + +<p>If one wants to use the alignments only for infering on instances without actually merging the classes, she can generate SWRL rules:</p> +<div class="fragment"><pre> +$ java -cp ../../lib/procalign.jar fr.inrialpes.exmo.align.util.ParserPrinter results/SMOA5.rdf -r fr.inrialpes.exmo.align.impl.renderer.SWRLRendererVisitor +</pre></div> +<p>which brings for the same assertions:</p> +<div class="owl"><pre> + <ruleml:imp> + <ruleml:_body> + <swrl:classAtom> + <owllx:Class owllx:name="http://alignapi.gforge.inria.fr/tutorial/myOnto.owl#Techreport"/> + <ruleml:var>x</ruleml:var> + </swrl:classAtom> + </ruleml:_body> + <ruleml:_head> + <swrlx:classAtom> + <owllx:Class owllx:name="http://alignapi.gforge.inria.fr/tutorial/edu.mit.visus.bibtex.owl#Techreport"/> + <ruleml:var>x</ruleml:var> + </swrl:classAtom> + </ruleml:_head> + </ruleml:imp> + + <ruleml:imp> + <ruleml:_body> + <swrl:individualPropertyAtom swrlx:property="http://alignapi.gforge.inria.fr/tutorial/myOnto.owl#copyright"/> + <ruleml:var>x</ruleml:var> + <ruleml:var>y</ruleml:var> + </swrl:individualPropertyAtom> + </ruleml:_body> + <ruleml:_head> + <swrl:datavaluedPropertyAtom swrlx:property="http://alignapi.gforge.inria.fr/tutorial/edu.mit.visus.bibtex.owl#hasCopyright"/> + <ruleml:var>x</ruleml:var> + <ruleml:var>y</ruleml:var> + </swrl:datavaluedPropertyAtom> + </ruleml:_head> + </ruleml:imp> +</pre></div> + +<p>Exchanging data can also be achieved more simply through <abbr>XLST</abbr> transformations which will transform the <acronym>OWL</acronym> instance files from one ontology to another:</p> +<div class="fragment"><pre> +$ java -cp ../../lib/procalign.jar fr.inrialpes.exmo.align.util.ParserPrinter results/SMOA5.rdf -r fr.inrialpes.exmo.align.impl.renderer.XSLTRendererVisitor -o results/SMOA5.xsl +</pre></div> +<p>this transformation can be applied to the data of <a href="data.xml">data.xml</a>:</p> +<div class="fragment"><pre> +$ xsltproc results/SMOA5.xsl data.xml > results/data.xml +</pre></div> +<p>for giving the <a href="results/data.xml">results/data.xml</a> file.</p> + +<h2>Evaluating</h2> + +<p>We will evaluate alignments by comparing them to some reference alignment which is supposed to express what is expected from an alignment of these two ontologies. The reference alignment is <a href="refalign.rdf">refalign.rdf</a> (or <a href="results/refalign.html"><abbr>HTML</abbr></a>).</p> + +<p>For evaluating we use another class than <tt>Procalign</tt>. It is called <tt>EvalAlign</tt> we should specify this to <tt>java</tt>. By default, it computes precision, recall and associated measures. It can be invoked this way:</p> +<div class="fragment"><pre> +$ java -cp ../../lib/procalign.jar fr.inrialpes.exmo.align.util.EvalAlign -i fr.inrialpes.exmo.align.impl.eval.PRecEvaluator file://localhost$CWD/refalign.rdf file://localhost$CWD/results/equal.rdf +</pre></div> + +<p>The first argument is always the reference alignment, the second one is the alignment to be evaluated. The result is given here:</p> +<div class="owl"><pre> +<?xml version="1.0" encoding="utf-8" standalone="yes"?> +<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" + xmlns:map="http://www.atl.external.lmco.com/projects/ontology/ResultsOntology.n3#"> + <map:output rdf:about=""> + <map:reference rdf:resource="file://localhost/JAVA/alignapi/html/tutorial/results/refalign.rdf"> + <map:input rdf:resource="file://localhost/JAVA/alignapi/html/tutorial/results/equal.rdf"> + <map:precision>1.0</map:precision> + <map:recall>0.22916666666666666</map:recall> + <fallout>0.0</fallout> + <map:fMeasure>0.37288135593220334</map:fMeasure> + <map:oMeasure>0.22916666666666666</map:oMeasure> + <result>0.22916666666666666</result> + </map:output> +</rdf:RDF> +</pre></div> + +<p>Of course, since that method only match objects with the same name, it is accurate, yielding a high precision. However, it has poor recall.</p> + +<p>We can now evaluate the edit distance. What to expect from the evaluation of this alignment?</p> +<div class="button"> + <input type="button" onclick="show('qu5')" value="Show Discussion"/> + <input type="button" onclick="hide('qu5')" value="Hide Discussion"/> +</div> +<div class="explain" id="qu5"><p>Since it returns more correspondences by loosening the constraints for being a correspondence, it is expected that the recall will increase at the expense of precision.</p></div> +<p>We can see the results of:</p> +<div class="fragment"><pre> +$ java -cp ../../lib/procalign.jar fr.inrialpes.exmo.align.util.EvalAlign -i fr.inrialpes.exmo.align.impl.eval.PRecEvaluator file://localhost$CWD/refalign.rdf file://localhost$CWD/results/levenshtein33.rdf +</pre></div> +<div class="button"> + <input type="button" onclick="show('qu6')" value="Show result"/> + <input type="button" onclick="hide('qu6')" value="Hide result"/> +</div> +<div class="explain" id="qu6"><pre> +<?xml version="1.0" encoding="utf-8" standalone="yes"?> +<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" + xmlns:map="http://www.atl.external.lmco.com/projects/ontology/ResultsOntology.n3#"> + <map:output rdf:about=""> + <map:reference rdf:resource="file://localhost/JAVA/alignapi/html/tutorial/results/refalign.rdf"> + <map:input rdf:resource="file://localhost/JAVA/alignapi/html/tutorial/results/levenshtein33.rdf"> + <map:precision>0.6811594202898551</map:precision> + <map:recall>0.9791666666666666</map:recall> + <fallout>0.3188405797101449</fallout> + <map:fMeasure>0.8034188034188035</map:fMeasure> + <map:oMeasure>0.5208333333333333</map:oMeasure> + <result>1.4374999999999998</result> + </map:output> +</rdf:RDF> +</pre></div> +<p>It is possible to summarize these results by comparing them to each others. This can be achieved by the <tt>GroupEval</tt> class. This class can output several formats (by default html) and takes all the alignments in the subdirectories of the current directory. Here we only have the <tt>results</tt> directory:</p> +<div class="fragment"><pre> +$ cp refalign.rdf results +$ java -cp ../../lib/procalign.jar fr.inrialpes.exmo.align.util.GroupEval -r refalign.rdf -l "refalign,equal,SMOA,SMOA5,levenshtein,levenshtein33" -c prf -o results/eval.html +</pre></div> + +<p>The results are displayed in the <a href="results/eval.html">results/eval.html</a> file whose main content is the table:</p> +<table border="2" frame="box" rules="groups"> + <colgroup /> + <colgroup span="2" /> + <colgroup span="2" /> + <colgroup span="2" /> + <colgroup span="2" /> + <colgroup span="2" /> + <colgroup span="2" /> + <thead valign="top"><tr> + <th>algo</th> + <th colspan="2">refalign</th> + <th colspan="2">equal</th> + <th colspan="2">SMOA</th> + <th colspan="2">SMOA5</th> + <th colspan="2">levenshtein</th> + <th colspan="2">levenshtein33</th> + </tr></thead><tbody><tr> + <td>test</td> + <td>Prec.</td> + <td>Rec.</td> + <td>Prec.</td> + <td>Rec.</td> + <td>Prec.</td> + <td>Rec.</td> + <td>Prec.</td> + <td>Rec.</td> + <td>Prec.</td> + <td>Rec.</td> + <td>Prec.</td> + <td>Rec.</td> + </tr></tbody><tbody><tr> + <td>results</td> + <td>1.00</td> + <td>1.00</td> + <td>1.00</td> + <td>0.23</td> + <td>0.56</td> + <td>0.98</td> + <td>0.69</td> + <td>0.96</td> + <td>0.53</td> + <td>1.00</td> + <td>0.68</td> + <td>0.98</td> + </tr><tr style="background-color: #FFFF00;"> + <td>H-mean</td><td>1.00</td> + <td>1.00</td> + <td>1.00</td> + <td>0.23</td> + <td>0.56</td> + <td>0.98</td> + <td>0.69</td> + <td>0.96</td> + <td>0.53</td> + <td>1.00</td> + <td>0.68</td> + <td>0.98</td> + </tr></tbody> +</table> + +<!--div class="fragment"><pre> +$ java -jar ../../lib/Procalign.jar file://localhost$CWD/rdf/edu.umbc.ebiquity.publication.owl file://localhost$CWD/rdf/edu.mit.visus.bibtex.owl -i fr.inrialpes.exmo.align.impl.method.StringDistAlignment -DstringFunction=levenshteinDistance -DprintMatrix=1 -o /dev/null > matrix.tex +</pre></div--> + +<div class="logic"><p><b>More work:</b> As you can see, the <tt>PRecEvaluator</tt> does not only provide precision and recall but also provides F-measure. F-measure is usually used as an "absolute" trade-off between precision and recall (i.e., the optimum F-measure is considered the best precision and recall). Can you establish this point for <acronym>SMOA</acronym> and levenshtein and tell which algorithm is more adapted?</p></div> + +<!--div class="logic"><p><b>More work:</b>If you want to compare several algorithms, there is another class, GroupAlign, that allows to run an</p></div--> + +<h2>Embedding</h2> + +<p>Of course, the goal of this <abbr>API</abbr> is not to be used at the command line level (even if it can be very useful). So if you are ready for it, you can develop in Java your own application that takes advantage of the <abbr>API</abbr>.</p> + +<p>A skeleton of program using the Alignment <abbr>API</abbr> is <a href="Skeleton.java">Skeleton.java</a>. It can be compiled by invoking:</p> +<div class="fragment"><pre> +$ javac -classpath ../../lib/api.jar:../../lib/rdfparser.jar:../../lib/align.jar:../../lib/procalign.jar -d results Skeleton.java +</pre></div> +<p>and run by:</p> +<div class="fragment"><pre> +$ java -cp ../../lib/Procalign.jar:results Skeleton myOnto.owl edu.mit.visus.bibtex.owl +</pre></div> + +<p>Now considering the <abbr>API</abbr> (that can be consulted through its thin <a href="../../javadoc/org/semanticweb/owl/align/Alignment.html">Javadoc</a> for instance), can you modify the Skeleton program in order for it performs the following:</p> +<ul> + <li>Run two different alignment methods (e.g., ngram distance and smoa);</li> + <li>Merge the two results;</li> + <li>Trim at various thresholds;</li> + <li>Evaluate them against the reference alignment and choose the one with the best F-Measure;</li> + <li>Displays it as <acronym title="Semantic Web Rule Language">SWRL</acronym> Rules.</li> +</ul> + +<p>Of course, you can do it progressively.</p> +<div class="fragment"><pre> +$ javac -classpath ../../lib/api.jar:../../lib/rdfparser.jar:../../lib/align.jar:../../lib/procalign.jar -d results MyApp.java +$ java -cp ../../lib/Procalign.jar:results MyApp myOnto.owl edu.mit.visus.bibtex.owl > results/MyApp.owl +</pre></div> + +<p>Do you want to see a possible solution?</p> +<div class="button"> + <input type="button" onclick="show('qu7')" value="Cheat"/> + <input type="button" onclick="hide('qu7')" value="Teacher is comming"/> +</div> +<div class="explain" id="qu7"><p>The main piece of code in Skeleton.java is replaced by:</p> +<pre> +// Run two different alignment methods (e.g., ngram distance and smoa) +AlignmentProcess a1 = new StringDistAlignment( onto1, onto2 ); +params.setParameter("stringFunction","smoaDistance"); +a1.align( (Alignment)null, params ); +AlignmentProcess a2 = new StringDistAlignment( onto1, onto2 ); +params = new BasicParameters(); +params.setParameter("stringFunction","ngramDistance"); +a1.align( (Alignment)null, params ); + +// Merge the two results. +((BasicAlignment)a1).ingest(a2); + +// Threshold at various thresholds +// Evaluate them against the references +// and choose the one with the best F-Measure +AlignmentParser aparser = new AlignmentParser(0); +Alignment reference = aparser.parse( "file://localhost"+(new File ( "refalign.rdf" ) . getAbsolutePath()), loaded ); +Evaluator evaluator = new PRecEvaluator( reference, a1 ); + +double best = 0.; +Alignment result = null; +for ( int i = 0; i <= 10 ; i = i+2 ){ + a1.cut( ((double)i)/10 ); + evaluator.eval( new BasicParameters() ); + System.err.println("Threshold "+(((double)i)/10)+" : "+((PRecEvaluator)evaluator).getFmeasure()); + if ( ((PRecEvaluator)evaluator).getFmeasure() > best ) { + result = (BasicAlignment)((BasicAlignment)a1).clone(); + best = ((PRecEvaluator)evaluator).getFmeasure(); } +} + +// Displays it as SWRL Rules +PrintWriter writer = new PrintWriter ( + new BufferedWriter( + new OutputStreamWriter( System.out, "UTF-8" )), true); +AlignmentVisitor renderer = new SWRLRendererVisitor(writer); +result.render(renderer); +writer.flush(); +writer.close(); +</pre></div> + +<p>A full working solution is <a href="MyApp.java">MyApp.java</a>.</p> + +<div class="logic"><p><b>More work:</b> Can you add a switch like the <tt>-i</tt> switch of <tt>Procalign</tt> so that the main class of the application can be passed at commant-line.</p></div> + +<div class="logic"><p><b>Advanced:</b> You can develop a specialized matching algorithm by subclassing the Java programs provided in the Alignment <abbr>API</abbr> implementation (like BasicAlignment or DistanceAlignment).</p></div> + +<div class="logic"><p><b>Advanced:</b> What about writing an editor for the alignment <abbr>API</abbr>?</p></div> - // Displays it as SWRL Rules - PrintWriter writer = new PrintWriter ( - new BufferedWriter( - new OutputStreamWriter( System.out, "UTF-8" )), true); - AlignmentVisitor renderer = new SWRLRendererVisitor(writer); - result.render(renderer); - writer.flush(); - writer.close(); - </pre></div> - <p> - A full working solution is <a href="MyApp.java">MyApp.java</a>. - </p> - <div class="logic"><p> - <b>More work:</b> Can you add a switch like the <tt>-i</tt> switch of <tt>Procalign</tt> so that the main class of the application can be passed at commant-line. - </p></div> - <div class="logic"><p> - <b>Advanced:</b> You can develop a specialized matching algorithm by subclassing the Java programs provided in the Alignment <abbr>API</abbr> implementation (like BasicAlignment or DistanceAlignment). - </p></div> - <div class="logic"><p> - <b>Advanced:</b> What about writing an editor for the alignment <abbr>API</abbr>? - </p></div> - - <h2>Further exercises</h2> - - <p> - More info: <a href="http://alignapi.gforge.inria.fr">http://alignapi.gforge.inria.fr</a> - </p> - - <!-- - Planning: - - Alignment server (incl. DB storage, agents, WSDL service) - - Extensive composition operators (with comp. tables) - - Expressive alignment language (with SEKT/François Sharffe) - --> +<h2>Further exercises</h2> - <h2>Acknowledgements</h2> +<p>More info: <a href="http://alignapi.gforge.inria.fr">http://alignapi.gforge.inria.fr</a></p> - <p> - The format of this tutorial has been shamelessly borrowed from Sean Bechhofer's <a href="http://owl.man.ac.uk/2005/07/sssw/"><acronym>OWL</acronym> tutorial</a>.</p> +<!-- +Planning: +- Alignment server (incl. DB storage, agents, WSDL service) +- Extensive composition operators (with comp. tables) +- Expressive alignment language (with SEKT/Fran篩s Sharffe) +--> - <hr /> - <p style="text-size: small; text-align: center;"> - http://alignapi.gforge.inria.fr/tutorial - </p> - <hr /> - <p>$Last modified: 29/09/2006 by Antoine Zimmermann.$</p> +<h2>Acknowledgements</h2> -</body> +<p>The format of this tutorial has been shamelessly borrowed from Sean Bechhofer's <a href="http://owl.man.ac.uk/2005/07/sssw/"><acronym>OWL</acronym> tutorial</a>.</p> +<hr /> +<small> +<p style="text-align: center;">http://alignapi.gforge.inria.fr/tutorial</p> +</small> +<hr /> +<p>$Id$</p> +</body> </html>