Commit 76ef0625 authored by Bruno Guillaume's avatar Bruno Guillaume

Add links to papers

parent 11273f14
......@@ -74,28 +74,26 @@ See the appendix for the ids of the removed sentences.
## References
### Deep syntactic annotation:
### Deep syntactic annotations
* **Marie Candito**, **Guy Perrier**, **Bruno Guillaume**, **Corentin Ribeyre**, **Karën Fort**, **Djamé Seddah** and **Éric de la Clergerie**. (2014) [*Deep Syntax Annotation of the Sequoia French Treebank.*](https://hal.inria.fr/hal-00969191v2/document) Proc. of LREC 2014, Reykjavic, Iceland.
* Marie Candito, Guy Perrier, Bruno Guillaume, Corentin Ribeyre, Karën Fort, Djamé Seddah and Éric de la Clergerie. (2014) Deep Syntax Annotation of the Sequoia French Treebank. Proc. of LREC 2014, Reykjavic, Iceland.
* Guy Perrier, Marie Candito, Bruno Guillaume, Corentin Ribeyre, Karën Fort and Djamé Seddah. (2014) Un schéma d’annotation en dépendances syntaxiques profondes pour le français. Proc. of TALN 2014, Marseille, France.
* **Guy Perrier**, **Marie Candito**, **Bruno Guillaume**, **Corentin Ribeyre**, **Karën Fort** and **Djamé Seddah**. (2014) [*Un schéma d’annotation en dépendances syntaxiques profondes pour le français.*](https://hal.inria.fr/hal-01054407/document) Proc. of TALN 2014, Marseille, France.
### MWE and named entities annotation:
* Candito M., Constant M., Ramisch C., Savary A., Parmentier Y., Pasquer C. et Antoine J.-Y. 2017, Annotation d'expressions polylexicales verbales en français, Actes de TALN 2017 - articles courts, Orléans, 2017.
* In preparation : A French corpus annotated for multi-word expressions and named entities.
* **Marie Candito**, **Mathieu Constant**, **Carlos Ramisch**, **Agata Savary**, **Yannick Parmentier**, **Caroline Pasquer** and **Jean-Yves Antoine**. (2017) [*Annotation d'expressions polylexicales verbales en français*](https://hal.archives-ouvertes.fr/hal-01537880/document), Actes de TALN 2017 - articles courts, Orléans
### Original papers (surface syntactic annotation):
* **In preparation**: A French corpus annotated for multi-word expressions and named entities.
* Candito M. and Seddah D., 2012a : "Le corpus Sequoia : annotation syntaxique et exploitation pour l’adaptation d’analyseur par pont lexical", Actes de TALN'2012, Grenoble, France
### Initial version (constituency trees + surface dependencies)
* **Marie Candito** and **Djamé Seddah**. (2012) [*Le corpus Sequoia : annotation syntaxique et exploitation pour l’adaptation d’analyseur par pont lexical*](https://hal.inria.fr/hal-00698938/document), Proceedings of TALN'2012, Grenoble, France.
* Candito M. and Seddah D., 2012b : "Effectively long-distance dependencies in French : annotation and parsing evaluation", Proceedings of TLT'11, 2012, Lisbon, Portugal)
* **Marie Candito** and **Djamé Seddah**. (2012) [*Effectively long-distance dependencies in French: annotation and parsing evaluation*](https://hal.inria.fr/hal-00769625/document), Proceedings of TLT'11, 2012, Lisbon, Portugal.
## Content
The corpus contains 3,099 sentences.
### Number of sentences for each sub-domain :
### Number of sentences for each sub-domain:
* 561 sentences Europarl file= `Europar.550+fct.mrg`
* 529 sentences EstRepublicain file= `annodis.er+fct.mrg`
* 996 sentences French Wikipedia file= `frwiki_50.1000+fct.mrg`
......@@ -104,9 +102,9 @@ The corpus contains 3,099 sentences.
### Tokenization, multi-word expressions and named entities
* before version **8.0** : the corpus contained grammatical MWEs only, each treated as one token (components separated with an underscore, as in "parce_que")
* versions **8.xxx** : each grammatical MWE was then represented as separated tokens (with all non-first components attached to the first component with a `dep_cpd` arc.
* from version **9.0** : the MWE and named entities annotated within the PARSEME-FR project were integrated to the corpus, in a separate layer (11th column of CUPT files). MWEs were classified into syntactically regular versus irregular MWEs. Only irregular MWEs have a flat representation with dep_cpd arcs. The syntactic representation for named entities and regular MWEs uses regular syntactic dependencies (no dep_cpd).
* before version **8.0**: the corpus contained grammatical MWEs only, each treated as one token (components separated with an underscore, as in *parce_que*)
* versions **8.xxx**: each grammatical MWE was then represented as separated tokens (with all non-first components attached to the first component with a `dep_cpd` arc.
* from version **9.0**: the MWE and named entities annotated within the PARSEME-FR project were integrated to the corpus, in a separate layer (11th column of CUPT files). MWEs were classified into syntactically regular versus irregular MWEs. Only irregular MWEs have a flat representation with `dep_cpd` arcs. The syntactic representation for named entities and regular MWEs uses regular syntactic dependencies (no `dep_cpd`).
## Dependency formats
......@@ -201,7 +199,7 @@ With respect to FTB bracketed format, some grammatical functions have been speci
* `P_OBJ` has been split to `P_OBJ.O` and `P_OBJ.AGT`
* `MOD` has been split to `MOD`, `MOD.APP`, `MOD.INC`, `MOD.CLEFT`
#### non referential "il" :
#### non referential "il":
`CLS-SUJ##_@@void=y il` is used for a non referential "il", entering an impersonal alternation (meaning the verb alternates with a diathesis in which the subject is referential: *Il arrive trois personnes* <=> *Trois personnes arrivent*.
`CLS-SUJ##_@@void=y,intrinsimp=y il` is used for a non referential "il", without syntactic alternation (e.g. *il faut 3 personnes* does not alternate with a construction of the same verb with referential subject * *3 personnes faut*
......@@ -220,7 +218,7 @@ The final subject is canonical "argc" (causer argument), and the final object (l
Data split (TALN 2012 experiments)
The "neutral" domain is made of EstRepublicain + Europarl + FrWiki,
and the split into dev and test sets is the following :
and the split into dev and test sets is the following:
```
head -265 annodis.er+fct.mrg >> sequoia-neutre-dev+fct.mrg
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment