Commit e2fb5bf5 authored by bguillaum's avatar bguillaum

quelques corrections dans le README-distrib

git-svn-id: svn+ssh://scm.gforge.inria.fr/svnroot/deep-sequoia@61 4f834e7d-5b19-456f-8924-a42755b34a2b
parent 4892dcd2
......@@ -4,7 +4,7 @@ Deep Sequoia corpus v7.0
november 2015
The corpus contains French sentences, from Europarl, Est Republicain newspaper,
French Wikipedia and European Medicine Agency, with the following manual annotations :
French Wikipedia and European Medicine Agency, with the following manual annotations:
- parts-of-speech and morphological features
- grammatical compound words (merged as one token)
......@@ -29,7 +29,7 @@ French Wikipedia and European Medicine Agency, with the following manual annotat
------------------------------------------------------
The corpus is freely available under the free licence LGPL-LR
(Lesser General Public License For Linguistic Resources)
cf. http://infolingu.univ-mlv.fr/DonneesLinguistiques/Lexiques-Grammaires/lgpllr.html
cf. http://deep-sequoia.inria.fr/lgpl-lr/
------------------------------------------------------
2. History of the corpus
......@@ -38,7 +38,7 @@ The corpus is freely available under the free licence LGPL-LR
The Sequoia corpus was first manually annotated for part-of-speech and phrase-structure, and automatically converted to surface syntactic dependency trees.
(Candito and Seddah, 2012a).
The phrase-structure annotation follows mainly the French Treebank guidelines
( http://www.llf.cnrs.fr/Gens/Abeille/French-Treebank-fr.php ),
(http://www.llf.cnrs.fr/Gens/Abeille/French-Treebank-fr.php),
modified in the context of conversion to dependencies:
- prepositions that dominate a infinitival VP do project a PP
- any sentence introduced by a complementizer (CS tag) is grouped into a Sint constituent
......@@ -62,12 +62,12 @@ This led to a first release of the Deep Sequoia corpus (v 1.0)
Annotating the corpus for deep syntax has sometimes led to correct some surface dependencies.
A further step of systematic search for inconsistencies was carried out,
using the Grew system (http://wikilligramme.loria.fr/doku.php/grew:grew).
using the Grew system (http://grew.loria.fr).
This led to the current release (7.0).
(Note: the current release number (7.0) was chosen to get same version numbers for the surface and the deep syntactic annotations of the corpus)
The deep sequoia corpus and the surface sequoia corpus contain the same 3099 sentences,
but note that the original surface corpus ( versions prior to 6.0) contained 101 more sentences, that turned out to be duplicates and were thus
but note that the original surface corpus (versions prior to 6.0) contained 101 more sentences, that turned out to be duplicates and were thus
subsequently removed (from the EMEA-test part of the corpus).
See the appendix for the ids of the removed sentences.
......@@ -75,14 +75,14 @@ See the appendix for the ids of the removed sentences.
3. References
------------------------------------------------------
** Deep syntactic annotation :
** Deep syntactic annotation:
- Marie Candito, Guy Perrier, Bruno Guillaume, Corentin Ribeyre, Karën Fort, Djamé Seddah and Éric de la Clergerie. (2014) Deep Syntax Annotation of the Sequoia French Treebank. Proc. of LREC 2014, Reykjavic, Iceland.
- Guy Perrier, Marie Candito, Bruno Guillaume, Corentin Ribeyre, Karën Fort and Djamé Seddah. (2014) Un schéma d’annotation en dépendances syntaxiques profondes pour le français. Proc. of TALN 2014, Marseille, France.
** Original paper (surface syntactic annotation) :
** Original paper (surface syntactic annotation):
Candito M. and Seddah D., 2012a : "Le corpus Sequoia : annotation syntaxique et exploitation pour l’adaptation d’analyseur par pont lexical", Actes de TALN'2012, Grenoble, France
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment