Commit 7883bc27 authored by Bruno Guillaume's avatar Bruno Guillaume

Fichiers déplacés vers le nouveau projet paris_nancy

parent 4c3efa78
extract:
conll_tool split ../miniref/sequoia.ud-trunk.conll sequoia-100.ids sequoia-100.conll
conll_tool split ~/gitlab/UD_French/fr-ud-train.conllu fr-ud-train-100.ids fr-ud-train-100.conll
clean:
rm -f sequoia-100.conll fr-ud-train-100.conll
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
=======================
UD
=======================
Comptages faits après adjudication entre Paris et Nancy
Infinitifs:
ajout du sujet des infinitifs:
- cas sans ajout nécessaire, correct : 4
- cas sans ajout, faux (missing) : 7
- cas avec ajout, correct : 22
- cas avec ajout, faux : ??? (1 à Nancy?) difficile de savoir s'il y en a d'autres
Listing des cas sans ajout, faux (missing):
- recours de le gouvernement pour s'approprier
- pour être affrétées
- débrouille pour trouver
- a réussi à s'enfuir
- et Secombe de répondre
- avant de le voir affaibli reprendre
- ne parvient pas à en observer un seul
Participes (passés ou présents):
- cas sans ajout nécessaire, correct : 34
- cas sans ajout, faux (missing) : 2
- cas avec ajout, correct : 35
- cas avec ajout, faux : ??? je n'en ai pas trouvé
This source diff could not be displayed because it is too large. You can view the blob instead.
fr-ud-train_00135
fr-ud-train_00419
fr-ud-train_00649
fr-ud-train_00766
fr-ud-train_00843
fr-ud-train_00970
fr-ud-train_01271
fr-ud-train_01602
fr-ud-train_01795
fr-ud-train_01940
fr-ud-train_01953
fr-ud-train_01996
fr-ud-train_02027
fr-ud-train_02296
fr-ud-train_02441
fr-ud-train_02496
fr-ud-train_02536
fr-ud-train_02589
fr-ud-train_02591
fr-ud-train_02772
fr-ud-train_03015
fr-ud-train_03102
fr-ud-train_03133
fr-ud-train_03152
fr-ud-train_03488
fr-ud-train_03571
fr-ud-train_03675
fr-ud-train_03700
fr-ud-train_03711
fr-ud-train_03762
fr-ud-train_03800
fr-ud-train_03847
fr-ud-train_04016
fr-ud-train_04177
fr-ud-train_04561
fr-ud-train_04639
fr-ud-train_04858
fr-ud-train_05177
fr-ud-train_05362
fr-ud-train_05714
fr-ud-train_05875
fr-ud-train_05937
fr-ud-train_06248
fr-ud-train_06616
fr-ud-train_07215
fr-ud-train_07238
fr-ud-train_07633
fr-ud-train_07724
fr-ud-train_07786
fr-ud-train_07932
fr-ud-train_07970
fr-ud-train_08438
fr-ud-train_08570
fr-ud-train_08628
fr-ud-train_08681
fr-ud-train_08692
fr-ud-train_08848
fr-ud-train_08856
fr-ud-train_09204
fr-ud-train_09387
fr-ud-train_09488
fr-ud-train_10015
fr-ud-train_10118
fr-ud-train_10291
fr-ud-train_10316
fr-ud-train_10346
fr-ud-train_10447
fr-ud-train_10479
fr-ud-train_10613
fr-ud-train_10634
fr-ud-train_10896
fr-ud-train_10913
fr-ud-train_11026
fr-ud-train_11118
fr-ud-train_11497
fr-ud-train_11804
fr-ud-train_12030
fr-ud-train_12354
fr-ud-train_12407
fr-ud-train_12425
fr-ud-train_12434
fr-ud-train_12456
fr-ud-train_12682
fr-ud-train_12763
fr-ud-train_12997
fr-ud-train_13038
fr-ud-train_13266
fr-ud-train_13430
fr-ud-train_13595
fr-ud-train_13672
fr-ud-train_13950
fr-ud-train_13998
fr-ud-train_14026
fr-ud-train_14051
fr-ud-train_14186
fr-ud-train_14207
fr-ud-train_14258
fr-ud-train_14273
fr-ud-train_14364
fr-ud-train_14411
This source diff could not be displayed because it is too large. You can view the blob instead.
annodis.er_00019
annodis.er_00095
annodis.er_00160
annodis.er_00172
annodis.er_00218
annodis.er_00345
annodis.er_00353
annodis.er_00362
annodis.er_00441
annodis.er_00481
annodis.er_00492
annodis.er_00522
emea-fr-dev_00069
emea-fr-dev_00131
emea-fr-dev_00148
emea-fr-dev_00153
emea-fr-dev_00220
emea-fr-dev_00262
emea-fr-dev_00271
emea-fr-dev_00317
emea-fr-dev_00343
emea-fr-dev_00462
emea-fr-dev_00473
emea-fr-dev_00478
emea-fr-dev_00486
emea-fr-dev_00510
emea-fr-dev_00544
emea-fr-test_00112
emea-fr-test_00136
emea-fr-test_00138
emea-fr-test_00155
emea-fr-test_00178
emea-fr-test_00241
emea-fr-test_00251
emea-fr-test_00263
emea-fr-test_00264
emea-fr-test_00292
emea-fr-test_00410
emea-fr-test_00416
emea-fr-test_00492
emea-fr-test_00511
Europar.550_00005
Europar.550_00034
Europar.550_00078
Europar.550_00112
Europar.550_00147
Europar.550_00189
Europar.550_00205
Europar.550_00207
Europar.550_00208
Europar.550_00253
Europar.550_00275
Europar.550_00276
Europar.550_00301
Europar.550_00335
Europar.550_00341
Europar.550_00342
Europar.550_00374
Europar.550_00435
Europar.550_00455
Europar.550_00541
frwiki_50.1000_00001
frwiki_50.1000_00024
frwiki_50.1000_00037
frwiki_50.1000_00090
frwiki_50.1000_00106
frwiki_50.1000_00109
frwiki_50.1000_00129
frwiki_50.1000_00144
frwiki_50.1000_00155
frwiki_50.1000_00229
frwiki_50.1000_00233
frwiki_50.1000_00237
frwiki_50.1000_00265
frwiki_50.1000_00266
frwiki_50.1000_00272
frwiki_50.1000_00319
frwiki_50.1000_00339
frwiki_50.1000_00374
frwiki_50.1000_00396
frwiki_50.1000_00432
frwiki_50.1000_00503
frwiki_50.1000_00530
frwiki_50.1000_00542
frwiki_50.1000_00588
frwiki_50.1000_00608
frwiki_50.1000_00611
frwiki_50.1000_00718
frwiki_50.1000_00736
frwiki_50.1000_00740
frwiki_50.1000_00742
frwiki_50.1000_00790
frwiki_50.1000_00843
frwiki_50.1000_00849
frwiki_50.1000_00885
frwiki_50.1000_00886
frwiki_50.1000_00911
frwiki_50.1000_00933
frwiki_50.1000_00937
frwiki_50.1000_00972
# set latexfile to the name of the main file without the .tex
#latexfile = eacltest
latexfile = main
# put the names of figure files here. include the .eps
figures =
#includedfiles = Intro.tex State-of-the-art.tex Experiments.tex Discussion.tex
includedfiles = conversion.tex\
enhanced-diat.tex\
enhanced-ud.tex\
evaluation.tex\
extension-infinitives.tex\
intro.tex\
main.tex\
related.tex
#Algos.tex Discussion.tex Results.tex \
# eacltest.tex experiences.tex intro.tex lexical-inpact.tex \
# presentation-treebank.tex protocol.tex state-of-the-art-comparaison.tex\
# corpus-setup-experiment.tex TODO.tex Tagset.tex crossparsing.tex \
# new_intro.tex new_presTB.tex new_protocol.tex examples-marie.tex
TEX = latex
# *.fig files may be in ./Figs
vpath %.fig Figs
all: pdf
$(latexfile).dvi : $(figures) $(includedfiles) $(latexfile).tex
%.eps : %.fig
fig2dev -L eps $< > $@
$(latexfile).pdf : $(latexfile).tex $(includedfiles) $(figures)
pdflatex $(latexfile).tex
pdf : $(latexfile).pdf
bib : $(latexfile).tex
pdflatex $(latexfile)
bibtex $(latexfile)
pdflatex $(latexfile)
pdflatex $(latexfile)
bibtex $(latexfile)
pdflatex $(latexfile)
pdflatex $(latexfile)
$(latexfile).ps : $(latexfile).dvi
dvips -Ppdf $(latexfile) -o $(latexfile).ps
ps : $(latexfile).ps
$(latexfile).tar.gz : $(figures) $(latexfile).tex
tar -czvf $(latexfile).tar.gz $(figures) $(latexfile).tex Figs/*.fig
tarball: $(latexfile).tar.gz
clean:
rm -f $(latexfile).bbl
rm -f $(latexfile).log
rm -f $(latexfile).dvi
rm -f $(latexfile).blg
rm -f $(latexfile).aux
rm -f $(latexfile).ps
rm -f $(latexfile).pdf
check:
cat $(latexfile).log | grep undef
dist-clean: clean
rm -f $(latexfile).ps
rm -f $(latexfile).pdf
viewX: pdf
open $(latexfile).pdf
viewA: pdf
open -a Preview.app $(latexfile).pdf
view: pdf
open -a Skim.app $(latexfile).pdf
gv: ps
gv $(latexfile).pdf
edit:
xemacs $(latexfile).tex &
up:
git pull
# svn up ./
commit:
git commit -m "typo abstract" ./
git push
ebib:
open emnlp2014.bib
\ No newline at end of file
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
\section{Converting French UD to French enhanced-UD and enhanced-UD-diat}
- rule based approach, cf. no training set available yet
(note that the result will be usable as training set for machine learning techniques)
- clues using morphology, syntax plus a few lexical lists (e.g. for control verbs with subject controllers versus object controllers etc...)
- tools (grew, ogre)
- some phenomena not handled automatically because not predictible with enough precision using syntax alone
-- expletive subjects ``il'' were marked manually
-- sharing of non-subject arguments
-- causative alternation (ambiguity for canonical function of direct objects and of indirect objects with prep a)
-- status of ``se'' clitic
- arbitrary control: some syntactic contexts are regular enough (pour Vinf, ...)
figures:
find . -name "*.dep" -type f -print | sed "s/.dep$$//" | xargs -I {} make "{}.pdf"
clean:
find . -name "*.dep" -type f -print | sed "s/.dep$$//" | xargs -I {} rm -f "{}.pdf"
.SUFFIXES: .pdf .dep
.dep.pdf:
dep2pict $< $@
[GRAPH] {scale=200; fontname="Arial"; edge_label_size=9; vspace=13; subword_size=10}
[WORDS] {
N_1 { word="l'"; subword="The" }
N_2 { word="accident"; subword="accident"}
N_3 { word="a"; subword="has"}
N_4 { word="été"; subword="been"}
N_5 { word="vu"; subword="seen"}
N_6 { word="par"; subword="by"}
N_7 { word="tous"; subword="all"}
}
[EDGES] {
N_2 -> N_1 { label="det" }
N_5 -> N_3 { label="aux" }
N_5 -> N_4 { label="aux:pass" }
N_5 -> N_7 { label="obl:agent@nsubj"; color=black; forecolor=red }
N_5 -> N_2 { label="nsubj:pass@obj"; color=black; forecolor=red }
}
[GRAPH] {scale=200; fontname="Arial"; edge_label_size=9; vspace=13; subword_size=10}
[WORDS] {
N_1 { word="ceux"; subword="those" }
N_2 { word="apparus"; subword="appeared" }
N_3 { word="en"; subword="in" }
N_4 { word="2001"; subword="2001" }
N_5 { word="sont"; subword="are" }
N_6 { word="résolus"; subword="resolved" }
}
[EDGES] {
N_1 -> N_2 { label="acl" }
N_2 -> N_4 { label="obl" }
N_4 -> N_3 { label="case" }
N_6 -> N_1 { label="nsubj" }
N_6 -> N_5 { label="cop" }
N_2 -> N_1 { label="nsub"; color=blue; forecolor=blue; bottom }
}
[GRAPH] {scale=200; fontname="Arial"; edge_label_size=9; vspace=13; subword_size=10}
[WORDS] {
N_0 { word="(a)"; }
N_1 { word="ceux"; subword="those" }
N_1b { word="(étant)"; subword="being" }
N_2 { word="apparus"; subword="appeared" }
N_3 { word="en"; subword="in" }
N_4 { word="2001"; subword="2001" }
}
[EDGES] {
N_1 -> N_2 { label="acl" }
N_2 -> N_4 { label="obl" }
N_4 -> N_3 { label="case" }
N_2 -> N_1b { label="aux" }
N_2 -> N_1 { label="nsubj"; color=blue; forecolor=blue; bottom }
}
[GRAPH] {scale=200; fontname="Arial"; edge_label_size=9; vspace=13; subword_size=10}
[WORDS] {
N_1 { word="ceux"; subword="those" }
N_2 { word="arrivant"; subword="arriving" }
N_3 { word="tôt"; subword="early" }
N_4 { word="partent"; subword="leave" }
N_5 { word="tôt"; subword="early" }
}
[EDGES] {
N_1 -> N_2 { label="advcl" }
N_2 -> N_3 { label="advmod" }
N_4 -> N_5 { label="advmod" }
N_4 -> N_1 { label="nsubj" }
N_2 -> N_1 { label="nsub"; color=blue; forecolor=blue; bottom }
}
[GRAPH] {scale=200; fontname="Arial"; edge_label_size=9; vspace=13; subword_size=10}
[WORDS] {
N_1 { word="arrivé"; subword="arrived" }
N_2 { word="hier"; subword="yesterday" }
N_3 { word=","; subword="," }
N_4 { word="Pierre"; subword="Peter" }
N_5 { word="repart"; subword="is_leaving" }
N_6 { word="demain"; subword="tomorrow" }
}
[EDGES] {
N_5 -> N_1 { label="advcl" }
N_1 -> N_2 { label="advmod" }
N_5 -> N_4 { label="nsubj" }
N_5 -> N_6 { label="advmod" }
N_1 -> N_4 { label="nsub"; color=blue; forecolor=blue; bottom }
}
[GRAPH] {scale=200; fontname="Arial"; edge_label_size=9; vspace=13; subword_size=10}
[WORDS] {
N_1 { word="un"; subword="a" }
N_2 { word="bandit"; subword="bandit"}
N_3 { word="prêt"; subword="ready"}
N_4 { word="à"; subword="to"}
N_5 { word="tuer"; subword="kill"}
}
[EDGES] {
N_2 -> N_1 { label="det" }
N_3 -> N_5 { label="xcomp" }
N_2 -> N_3 { label="amod" }
N_5 -> N_4 { label="mark" }
N_5 -> N_2 { label="nsubj"; bottom; color=blue; forecolor=blue }
}
[GRAPH] {scale=200; fontname="Arial"; edge_label_size=9; vspace=13; subword_size=10}
[WORDS] {
N_1 { word="ce"; subword="this" }
N_2 { word="bandit"; subword="bandit"}
N_2b { word="est"; subword="is"}
N_3 { word="prêt"; subword="ready"}
N_4 { word="à"; subword="to"}
N_5 { word="tuer"; subword="kill"}
}
[EDGES] {
N_2 -> N_1 { label="det" }
N_3 -> N_5 { label="xcomp" }
N_3 -> N_2 { label="nsubj" }
N_3 -> N_2b { label="cop" }
N_5 -> N_4 { label="mark" }
N_5 -> N_2 { label="nsubj"; bottom; color=blue; forecolor=blue }
}
[GRAPH] {scale=200; fontname="Arial"; edge_label_size=9; vspace=13; subword_size=10}
[WORDS] {
N_0 { word="(a)" }
N_1 { word="la"; subword="the" }
N_2 { word="branche"; subword="branch" }
N_3 { word="s'"; subword="SE" }
N_4 { word="est"; subword="is" }
N_5 { word="cassée"; subword="brokent" }
}
[EDGES] {
N_2 -> N_1 { label="det" }
N_5 -> N_3 { label="expl" }
N_5 -> N_4 { label="aux" }
N_5 -> N_2 { label="nsubj@obj"; color=black; forecolor=red }
}
[GRAPH] {scale=200; fontname="Arial"; edge_label_size=9; vspace=13; subword_size=10}
[WORDS] {
N_0 { word="(b)" }
N_1 { word="une"; subword="a" }
N_2 { word="branche"; subword="branch" }
N_3 { word="se"; subword="SE" }
N_4 { word="casse"; subword="breaks" }
N_5 { word="à"; subword="at" }
N_6 { word="la"; subword="the" }
N_7 { word="main"; subword="hand" }
}
[EDGES] {
N_2 -> N_1 { label="det" }
N_4 -> N_3 { label="expl" }
N_7 -> N_6 { label="det" }
N_7 -> N_5 { label="case" }
N_4 -> N_7 { label="obl" }
N_4 -> N_2 { label="nsubj@obj"; color=black; forecolor=red }
}
[GRAPH] {scale=180; fontname="Arial"; edge_label_size=9; vspace=13; subword_size=10}
[WORDS] {
N_0 { word="(b)"; }
N_1 { word="ceux"; subword="those" }
N_1b { word="(ayant)"; subword="having" }
N_1t { word="été"; subword="been" }
N_2 { word="embauchés"; subword="hired" }
N_3 { word="en"; subword="in" }
N_4 { word="2007"; subword="2007" }
}
[EDGES] {
N_1 -> N_2 { label="acl" }
N_2 -> N_4 { label="obl" }
N_4 -> N_3 { label="case" }
N_2 -> N_1b { label="aux" }
N_2 -> N_1t { label="aux:pass" }
N_2 -> N_1 { label="nsub:pass@obj"; color=blue; forecolor=red; bottom }
}
[GRAPH] {scale=200; fontname="Arial";edge_label_size=9; vspace=13}
[WORDS] {
N_1 { word="The" }
N_2 { word="charges" }
N_14 { word="are" }
N_15 { word="false" }
N_16 { word="and" }
N_17 { word="can" }
N_18 { word="be" }
N_19 { word="demonstrated" }
N_20 { word="by" }
N_21 { word="the" }
N_22 { word="historical" }
N_23 { word="record" }
N_24 { word="to" }
N_25 { word="be" }
N_26 { word="false" }
}
[EDGES] {
N_2 -> N_1 { label="det" }
N_15 -> N_2 { label="nsubj" }
N_15 -> N_14 { label="cop" }
N_19 -> N_16 { label="cc" }
N_19 -> N_17 { label="aux" }
N_19 -> N_18 { label="aux:pass" }
N_15 -> N_19 { label="conj" }
N_23 -> N_20 { label="case" }
N_23 -> N_21 { label="det" }
N_23 -> N_22 { label="amod" }
N_19 -> N_23 { label="obl:agent@nsubj"; color=black; forecolor=red }
N_26 -> N_24 { label="mark" }
N_26 -> N_25 { label="cop" }
N_19 -> N_26 { label="xcomp" }