Embedding the Alignment API

This version:
http://alignapi.gforge.inria.fr/tutorial/tutorial3/embed.html
Author:
Jérôme Euzenat, INRIA & LIG

Here is a tutorial for embedding the alignment API within your own applications.

Of course, the goal of the Alignment API is not to be used at the command line level (even if it can be very useful). So if you are ready for it, you can develop in Java your own application that takes advantage of the API.

Starting point

A skeleton of program using the Alignment API is Skeleton.java. It can be compiled by invoking:

$ javac -classpath ../../lib/align.jar:../../lib/procalign.jar -d results Skeleton.java

and run by:

$ java -cp ../../lib/Procalign.jar:results Skeleton file://$CWD/myOnto.owl file://$CWD/edu.mit.visus.bibtex.owl

Now considering the API (that can be consulted through its thin Javadoc for instance), can you modify the Skeleton program so that it performs the following:

Of course, you can do it progressively.

Call an alignment method

Matching two ontologies is achieved with three steps:

The matching method takes two arguments: an eventual alignment to improve on (which can be null) a set of parameters

Below, two different methods (StringDistAlignment with two different stringFunction parameters) are instantiated and run:

// Run two different alignment methods (e.g., ngram distance and smoa) AlignmentProcess a1 = new StringDistAlignment(); params.setParameter("stringFunction","smoaDistance"); a1.init ( onto1, onto2 ); a1.align( (Alignment)null, params ); AlignmentProcess a2 = new StringDistAlignment(); a2.init ( onto1, onto2 ); params = new BasicParameters(); params.setParameter("stringFunction","ngramDistance"); a2.align( (Alignment)null, params );
After this step, the two matching methods have been processed and the result is available within the alignment instances (a1 and a2).

Manipulate alignments (merge and trim)

Alignments offer methods to manipulate these alignments. In particular, it is possible to

// Merge the two results. ((BasicAlignment)a1).ingest(a2); // Threshold at various thresholds

Evaluating alignments

Alignments can also be evaluated. For that purpose, the API provides the Evaluator interface. Similarly, to AlignmentProcess, this interface is called by:

Below the provided code first creates a parser for loading the reference alignment, then creates an instance of PRecEvaluator for computing precision and recall between the alignment a1 above with respects to the reference alignment.

// Evaluate them against the references // and choose the one with the best F-Measure AlignmentParser aparser = new AlignmentParser(0); Alignment reference = aparser.parse( (new File ( "refalign.rdf" ) ) . toURL() . toString()); Evaluator evaluator = new PRecEvaluator( reference, a1 ); Parameters p = new BasicParameters(); evaluator.eval( p );
As previously, results are stored within the Evaluator object and are accessed through specific accessors.

As an excercise, one could try to trim the alignment a1 with thresholds of 0., .2, .4, .6, .8, and 1., to evaluate these results for precision and recall and to select the one with the best F-measure.

  double best = 0.;
  Alignment result = null;
  for ( int i = 0; i <= 10 ; i += 2 ){
    a1.cut( ((double)i)/10 );
    evaluator = new PRecEvaluator( reference, a1 );
    evaluator.eval( p );
    System.err.println("Threshold "+(((double)i)/10)+" : "+((PRecEvaluator)evaluator).getFmeasure()+" over "+a1.nbCells()+" cells");
    if ( ((PRecEvaluator)evaluator).getFmeasure() > best ) {
       result = (BasicAlignment)((BasicAlignment)a1).clone();
       best = ((PRecEvaluator)evaluator).getFmeasure();
    }
  }

Displaying an alignment

Finally, alignments can be displayed through a variety of formats through the AlignmentVisitor abstraction. Alignment are displayed by:

For instance, it is possible to print on the standard output the alignment selected at the previous exercise as a set of OWL axioms.

  // Displays it as OWL Rules
  PrintWriter writer = new PrintWriter (
		        new BufferedWriter(
	                 new OutputStreamWriter( System.out, "UTF-8" )), true);
  AlignmentVisitor renderer = new OWLAxiomsRendererVisitor(writer);
  result.render(renderer);
  writer.flush();
  writer.close();

Putting these together

Do you want to see a possible solution?

The main piece of code in Skeleton.java is replaced by:

  // Run two different alignment methods (e.g., ngram distance and smoa)
  AlignmentProcess a1 = new StringDistAlignment();
  params.setParameter("stringFunction","smoaDistance");
  a1.init ( onto1, onto2 );
  a1.align( (Alignment)null, params );
  AlignmentProcess a2 = new StringDistAlignment();
  a2.init ( onto1, onto2 );
  params = new BasicParameters();
  params.setParameter("stringFunction","ngramDistance");
  a2.align( (Alignment)null, params );

  // Merge the two results.
  ((BasicAlignment)a1).ingest(a2);

  // Threshold at various thresholds
  // Evaluate them against the references
  // and choose the one with the best F-Measure
  AlignmentParser aparser = new AlignmentParser(0);
  Alignment reference = aparser.parse( (new File ( "refalign.rdf" ) ) . toURL() . toString());
  Evaluator evaluator = new PRecEvaluator( reference, a1 );

  double best = 0.;
  Alignment result = null;
  Parameters p = new BasicParameters();
  for ( int i = 0; i <= 10 ; i += 2 ){
    a1.cut( ((double)i)/10 );
    evaluator = new PRecEvaluator( reference, a1 );
    evaluator.eval( p );
    System.err.println("Threshold "+(((double)i)/10)+" : "+((PRecEvaluator)evaluator).getFmeasure()+" over "+a1.nbCells()+" cells");
    if ( ((PRecEvaluator)evaluator).getFmeasure() > best ) {
       result = (BasicAlignment)((BasicAlignment)a1).clone();
       best = ((PRecEvaluator)evaluator).getFmeasure();
    }
  }
  // Displays it as OWL Rules
  PrintWriter writer = new PrintWriter (
		        new BufferedWriter(
	                 new OutputStreamWriter( System.out, "UTF-8" )), true);
  AlignmentVisitor renderer = new OWLAxiomsRendererVisitor(writer);
  a1.render(renderer);
  writer.flush();
  writer.close();

This can be compiled and used through:

$ javac -classpath ../../lib/align.jar:../../lib/procalign.jar -d results MyApp.java $ java -cp ../../lib/Procalign.jar:results MyApp file://$CWD/myOnto.owl file://$CWD/edu.mit.visus.bibtex.owl > results/MyApp.owl

The execution provides an insight about the best threshold:

Threshold 0.0 : 0.4693877551020408 over 148 cells
Threshold 0.2 : 0.5227272727272727 over 128 cells
Threshold 0.4 : 0.5476190476190476 over 120 cells
Threshold 0.6 : 0.6478873239436619 over 94 cells
Threshold 0.8 : 0.75 over 72 cells
Threshold 1.0 : 0.5151515151515151 over 18 cells

A full working solution is MyApp.java.

More work: Can you add a switch like the -i switch of Procalign so that the main class of the application can be passed at commant-line.

Advanced: You can develop a specialized matching algorithm by subclassing the Java programs provided in the Alignment API implementation (like BasicAlignment or DistanceAlignment).

Advanced: What about writing an editor for the alignment API?

Further exercises

More info: http://alignapi.gforge.inria.fr


http://alignapi.gforge.inria.fr/tutorial/tutorial3/embed.html

$Id$