Commit 17467a95 authored by Bruno Guillaume's avatar Bruno Guillaume

SUD and 0.48

parent bb3981de
+++
date = "2018-04-06T11:29:34+02:00"
title = "SUD"
Description = ""
menu = "main"
Categories = ["Development","GoLang"]
Tags = ["Development","golang"]
+++
# Conversion SUD --> UD and UD --> SUD
[Projet gitlab du système de conversion](http://gitlab.inria.fr/grew/SUD)
## How to read graphs?
* When the automatic conversion from the other format is identical to the gold annotation, everything is drawn in black
* When the automatic conversion from the other format is different from the gold annotation:
* common parts are drawn in black
* gold annotation only is in yellow
* result of the conversion is in green
## Examples from the first meeting
| id | SUD | UD |
|:---:|:---:|:---:|
| `je_t_aime` | ![](/_sud_diff/je_t_aime.svg) | ![](/_ud_diff/je_t_aime.svg) |
| `j_aime_P` | ![](/_sud_diff/j_aime_P.svg) | ![](/_ud_diff/j_aime_P.svg) |
| `je_te_parle` | ![](/_sud_diff/je_te_parle.svg) | ![](/_ud_diff/je_te_parle.svg) |
| `je_parle_a_M` | ![](/_sud_diff/je_parle_a_M.svg) | ![](/_ud_diff/je_parle_a_M.svg) |
| `on_nomme_Jean_N` | ![](/_sud_diff/on_nomme_Jean_N.svg) | ![](/_ud_diff/on_nomme_Jean_N.svg) |
| `il_fait_VINF_P` | ![](/_sud_diff/il_fait_VINF_P.svg) | ![](/_ud_diff/il_fait_VINF_P.svg) |
| `il_oblige_P_a_VINF` | ![](/_sud_diff/il_oblige_P_a_VINF.svg) | ![](/_ud_diff/il_oblige_P_a_VINF.svg) |
| `difficile_a_lire` | ![](/_sud_diff/difficile_a_lire.svg) | ![](/_ud_diff/difficile_a_lire.svg) |
| `possible_que` | ![](/_sud_diff/possible_que.svg) | ![](/_ud_diff/possible_que.svg) |
| `difficile_a_expliquer` | ![](/_sud_diff/difficile_a_expliquer.svg) | ![](/_ud_diff/difficile_a_expliquer.svg) |
| `avoir_besoin` | ![](/_sud_diff/avoir_besoin.svg) | ![](/_ud_diff/avoir_besoin.svg) |
| `est_gentil` | ![](/_sud_diff/est_gentil.svg) | ![](/_ud_diff/est_gentil.svg) |
| `reste_gentil` | ![](/_sud_diff/reste_gentil.svg) | ![](/_ud_diff/reste_gentil.svg) |
| `one_aux` | ![](/_sud_diff/one_aux.svg) | ![](/_ud_diff/one_aux.svg) |
| `one_pass` | ![](/_sud_diff/one_pass.svg) | ![](/_ud_diff/one_pass.svg) |
| `aux_and_pass` | ![](/_sud_diff/aux_and_pass.svg) | ![](/_ud_diff/aux_and_pass.svg) |
| `etre_oblique` | ![](/_sud_diff/etre_oblique.svg) | ![](/_ud_diff/etre_oblique.svg) |
| `etre_cop` | ![](/_sud_diff/etre_cop.svg) | ![](/_ud_diff/etre_cop.svg) |
| `tres_difficile` | ![](/_sud_diff/tres_difficile.svg) | ![](/_ud_diff/tres_difficile.svg) |
| `difficilement_lisible` | ![](/_sud_diff/difficilement_lisible.svg) | ![](/_ud_diff/difficilement_lisible.svg) |
| `beaucoup_de` | ![](/_sud_diff/beaucoup_de.svg) | ![](/_ud_diff/beaucoup_de.svg) |
## Some other examples
| id | SUD | UD |
|:---:|:---:|:---:|
| `par_P` | ![](/_sud_diff/par_P.svg) | ![](/_ud_diff/par_P.svg) |
| `par_son_nom` | ![](/_sud_diff/par_son_nom.svg) | ![](/_ud_diff/par_son_nom.svg) |
| `one_aux_neg` | ![](/_sud_diff/one_aux_neg.svg) | ![](/_ud_diff/one_aux_neg.svg) |
| `one_pass_neg` | ![](/_sud_diff/one_pass_neg.svg) | ![](/_ud_diff/one_pass_neg.svg) |
| `aux_and_pass_neg` | ![](/_sud_diff/aux_and_pass_neg.svg) | ![](/_ud_diff/aux_and_pass_neg.svg) |
| `ccomp_obj` | ![](/_sud_diff/ccomp_obj.svg) | ![](/_ud_diff/ccomp_obj.svg) |
| `ccomp_obl` | ![](/_sud_diff/ccomp_obl.svg) | ![](/_ud_diff/ccomp_obl.svg) |
| `aux_mark` | ![](/_sud_diff/aux_mark.svg) | ![](/_ud_diff/aux_mark.svg) |
## Examples from corpora
* `Europar.550_00040`
![](/_sud_diff/Europar.550_00040.svg)
![](/_ud_diff/Europar.550_00040.svg)
* `fr-ud-train_09696`
![](/_sud_diff/fr-ud-train_09696.svg)
![](/_ud_diff/fr-ud-train_09696.svg)
* `fr-ud-train_11980`
![](/_sud_diff/fr-ud-train_11980.svg)
![](/_ud_diff/fr-ud-train_11980.svg)
* `fr-ud-train_09113`
{{< large file="_ud_diff/fr-ud-train_09113.svg" >}}
{{< large file="_sud_diff/fr-ud-train_09113.svg" >}}
* `fr-ud-dev_00204`
{{< large file="_ud_diff/fr-ud-dev_00204.svg" >}}
{{< large file="_sud_diff/fr-ud-dev_00204.svg" >}}
* `fr-ud-dev_00131`
{{< large file="_ud_diff/fr-ud-dev_00131.svg" >}}
{{< large file="_sud_diff/fr-ud-dev_00131.svg" >}}
......@@ -16,7 +16,7 @@ For the sentence:
- "*La souris a été mangée par le chat.*" ["*The mouse was eaten by the cat.*"].
the deep structure is: ![Deep dependency structure](/deep_syntax/test.deep.svg)
the deep structure (following Deep-sequoia guidelines) is: ![Deep dependency structure](/deep_syntax/test.deep.svg)
With **Grew**, this representation can be computed from the surface syntax in two steps:
......@@ -54,7 +54,6 @@ wget https://gitlab.inria.fr/sequoia/deep-sequoia/raw/master/tools/sequoia_proj.
The deep structure is then computed with the command:
`grew transform -grs sequoia_proj.grs -strat deep -i test.deep_and_surf.conll -o test.deep.conll`
`grew transform -grs sequoia_proj.grs -strat deep -i test.deep_and_surf.conll -o test.deep.conll`
The output [`test.deep.conll`](/deep_syntax/test.deep.conll) is given below (code and picture):
......
+++
date = "2018-06-05T11:16:30+02:00"
title = "features"
menu = "main"
Categories = ["Development","GoLang"]
Tags = ["Development","golang"]
Description = ""
+++
# CoNLL files
The most common way to store dependency structures is the CoNLL format.
Several extension were proposed and we describe here the one which is used by **Grew**, kwown as [CoNLL-U](http://universaldependencies.org/format.html) format defined in the Unverisal Dependency project.
For each sentence, some metadata are given in lines beginning by `#` followed by one line per lexical unit.
These lines contain 10 fields, separated by tabulations.
Here is an example of CoNLL-U data taken form the corpus `UD_English-PUD` (version 2.1).
```
# sent_id = n01118003
# text = Drop the mic.
1 Drop drop VERB VB VerbForm=Inf 0 root _ _
2 the the DET DT Definite=Def|PronType=Art 3 det _ _
3 mic mic NOUN NN Number=Sing 1 obj _ SpaceAfter=No
4 . . PUNCT . _ 1 punct _ _
```
We explain here how **Grew** deals with the 10 fields if CoNLL files:
1. **ID**. This field is a number used as an identifier for the corresponding lexical unit (LU).
In Grew, it is available as the feature `position` (most of the times it not useful to use this field, constraints on relative positions can be expressed with the `<` or `<<` syntax).
2. **FORM**. The phonological form of the LU.
In Grew, the value of this field is available through a feature named `form`
(for backward compatibility, the keyword `phon` can also be used instead of `form`).
3. **LEMMA**. The lemma of the LU. In Grew, this correponds to the feature `lemma`.
4. **UPOS**. The field `upos` (for backward compatibility, `cat` can also be used to refer to this field).
5. **XPOS**. The field `xpos` (for backward compatibility, `pos` can also be used to refer to this field).
6. **FEATS**. List of morphological features.
7. **HEAD**. Head of the current word, which is either a value of ID or `0` for the root node.
8. **DEPREL**. Dependency relation to the HEAD (root iff HEAD = 0).
9. **DEPS**. Enhanced dependency graph in the form of a list of head-deprel pairs. In Grew, the relation are available with the prefix `E:`
10. **MISC**. Any other annotation. In Grew, annotation of the field are accessible with the prefix `_MISC_`.
## Note about backward compatibility
In older versions of Grew (before the definition of the CoNLL-U format), the fields 2, 4 and 5 where accessible with the names `phon`, `cat` and `pos` respectively.
To have a backward compatibility and uniform handling of these fields, the 3 names `phon`, `cat` and `pos` are replaced at parsing time by `form`, `upos` and `xpos`.
As a consequence, it is impossible to use both `phon` and `form` in the same system.
We highly recommend to use only the `form` feature in this setting.
Of course, the same observation applies to `cat` and `upos` (`upos` should be used) and to `pos` and `xpos` (`xpos` should be chosen).
\ No newline at end of file
+++
date = "2018-04-25"
title = "Gtk installation"
+++
A GTK interface is available (on Linux and Mac OS&nbsp;X, untested on Windows) separately.
# Installation of the GTK interface
We suppose that the basic version ([see install page](../install)) is already installed.
## Linux
* Install GUI interface
* `apt-get install graphviz pkg-config librsvg2-dev libwebkitgtk-dev libglade2-dev libgtk2.0-dev`
* `opam install grew_gui`
* Test
* Run `Grew gui` to run the GTk interface
* In case of trouble, [fill an issue](https://gitlab.inria.fr/grew/grew_doc/issues)
## On Mac OS&nbsp;X
* Prerequisite Mac application for running X11 GUI.
* Install [XQuartz](http://www.xquartz.org/)
* Install GUI interface
* `sudo port install graphviz librsvg libglade2 webkit-gtk`
* `opam install grew_gui`
* Test
* Run `Grew gui` to run the GTk interface
* In case of trouble, [fill an issue](https://gitlab.inria.fr/grew/grew_doc/issues)
......@@ -13,6 +13,8 @@ Categories = ["Development","GoLang"]
**Grew** is a Graph Rewriting tool dedicated to applications in Natural Language Processing (NLP). It can manipulate many kinds of linguistic representation. It has been used on POS-tagged sequence, surface dependency syntax, deep dependency syntax, semantic representation (AMR, DMRS) but it can be used to represent any graph-based structure.
## News
**2018/06/05:** New release of version **0.48**. See [What's new](../whats) for changes
**April 2018:** Publication of the book [*Application of Graph Rewriting to Natural Language Processing*](https://www.wiley.com/en-fr/Application+of+Graph+Rewriting+to+Natural+Language+Processing-p-9781119522348).
The chapter 1 is [available from the editor website](https://media.wiley.com/product_data/excerpt/66/17863009/1786300966-587.pdf).
......@@ -20,7 +22,6 @@ The chapter 1 is [available from the editor website](https://media.wiley.com/pro
<a href="https://www.wiley.com/en-fr/Application+of+Graph+Rewriting+to+Natural+Language+Processing-p-9781119522348"><img src="https://media.wiley.com/product_data/coverImage300/66/17863009/1786300966.jpg" alt="Book cover" style="width: 200px;"/></a>
</center>
**2018/03/18:** New release of version **0.47**. See [What's new](../whats) for changes
## A first taste of Grew
The easiest way to try and test **Grew** is to use one of the two online interfaces.
......
......@@ -23,7 +23,7 @@ The file with the code below: `lib_usage.ml` ([Download](/lib_usage/lib_usage.ml
`ocamlbuild -use-ocamlfind -pkgs 'libgrew, libgrew, conll, yojson, log, containers, str, ANSITerminal' lib_usage.native`
```ocaml
{{< input file="/static/lib_usage/lib_usage.ml" >}}
{{< input file="static/lib_usage/lib_usage.ml" >}}
```
......@@ -47,6 +47,6 @@ The converted version of `graph1` is `graph1__ssq_to_dsq`.
Again, the output of `ssq_to_dsq` must be converted before being used as an input to the next GRS `dsq_to_deep` (see below).
```ocaml
{{< input file="/static/lib_usage/lib_usage_0.46.ml" >}}
{{< input file="static/lib_usage/lib_usage_0.46.ml" >}}
```
......@@ -45,7 +45,7 @@ For instance, the following command:
produces the file [`test.melt`](/parsing/test.melt):
{{< input file="/static/parsing/test.melt" >}}
{{< input file="static/parsing/test.melt" >}}
## Parsing with the GRS
......@@ -55,13 +55,13 @@ With the file [`test.melt`](/parsing/test.melt) described above, the following c
The output file is [`test.surf.conll`](/parsing/test.surf.conll):
{{< input file="/static/parsing/test.surf.conll" >}}
{{< input file="static/parsing/test.surf.conll" >}}
which encodes the syntactic structure:
![Dependency structure](/parsing/test.svg)
![Dependency structure](/parsing/test.surf.svg)
It is also possible to runs a GTK interface in which you can explore step by step rewriting of the input sentence:
It is also possible to run a GTK interface in which you can explore step by step rewriting of the input sentence:
`grew gui -grs POStoSSQ/grs/surf_synt_main.grs -i test.melt`
......@@ -71,7 +71,7 @@ We will suppose here that the input file is already split in sentences (one by l
Suppose that the file [`tdm80_ch01.txt`](/parsing/tdm80_ch01.txt) contains the following data:
{{< input file="/static/parsing/tdm80_ch01.txt" >}}
{{< input file="static/parsing/tdm80_ch01.txt" >}}
The parsing can be done with the same two steps process:
......
......@@ -8,20 +8,25 @@ menu = "main"
+++
* The version numbers `x.y.z` are synchronized such that `x` and `y` are identical for the 3 main sub-projects (`grew`, `grew_gui`, `libcaml-grew`). `z` is linked to bug fixes and may vary.
* The version numbers `x.y.z` are synchronized such that `x` and `y` are identical for the 3 main sub-projects (`grew`, `grew_gui`, `libcaml-grew`). The third component `z` is linked to bug fixes and may vary across the 3 sub-projects.
* The symbol ":warning:" indicates changes that may break backward compatibility.
---
# **last release** Version 0.47 on March 13, 2018
* Add `grewpy` executable for Python library
* `-safe_commands` option
More detailled informations in files `CHANGES.md` for each sub-project:
[libcaml-grew](https://gitlab.inria.fr/grew/libcaml-grew/blob/master/CHANGES.md),
[grew](https://gitlab.inria.fr/grew/grew/blob/master/CHANGES.md),
[grew_gui](https://gitlab.inria.fr/grew/grew_gui/blob/master/CHANGES.md)
---
# [**last release**] Version 0.48 on June 5, 2018
* remove `conll_fields` mechanism (names of conll fields 2, 4 and 5 are `form`, `upos`, `xpos`)
---
# Version 0.47 on March 13, 2018
* Add `grewpy` executable for Python library
* `-safe_commands` option
---
......
......@@ -23,11 +23,12 @@
<li class="section">Available GRS</li>
<li><a href="/parsing">Dependency parsing</a></li>
<li><a href="/deep_syntax">Deep syntax</a></li>
<li><a href="/todo">DMRS</a></li>
<li><a href="/todo">Other GRS</a></li>
<!-- <li><a href="/todo">DMRS</a></li> -->
<!-- <li><a href="/todo">Other GRS</a></li> -->
<hr/>
<li class="section">GRS development</li>
<li class="section">Documentation</li>
<li><a href="/features">CoNLL files</a></li>
<li><a href="/pattern">Pattern syntax</a></li>
<li><a href="/commands">Command syntax</a></li>
<li><a href="/rule">Rule syntax</a></li>
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment