Commit c06d14dd authored by Bruno Guillaume's avatar Bruno Guillaume

version soumise

parent d15ee279
......@@ -2,10 +2,10 @@
%% http://bibdesk.sourceforge.net/
%% Created for Djamé Seddah at 2017-05-18 01:17:18 +0200
%% Created for Djamé Seddah at 2017-05-18 01:17:18 +0200
%% Saved with string encoding Unicode (UTF-8)
%% Saved with string encoding Unicode (UTF-8)
......@@ -1139,7 +1139,7 @@
@article{abeille-godard-causatives-1997,
title = {Les causatives en français, un cas de compétition syntaxique [in French]},
author ={Anne Abeille and Danielle Godard and Philip Miller},
author ={Anne Abeill{\'e} and Danielle Godard and Philip Miller},
journal = {Langue française},
volume = {115},
number = {1},
......
......@@ -13,7 +13,7 @@
N_2 -> N_1 { label="det" }
N_2 -> N_4 { label="nmod" }
N_4 -> N_3 { label="case" }
N_6 -> N_5 { label="mark" }
N_6 -> N_7 { label="mark" }
N_5 -> N_7 { label="xcomp" }
N_5 -> N_2 { label="nsubj" }
N_7 -> N_8 { label="advmod" }
......
[GRAPH] {scale=180; fontname="Arial"; edge_label_size=9; vspace=13; subword_size=10; word_spacing=30}
[WORDS] {
N_1 { word="Ils"; subword="they"; }
N_2 { word="semblent"; subword="seem"; }
N_3 { word="vouloir"; subword="to-want"; }
N_4 { word="partir"; subword="to-leave"; }
}
[EDGES] {
N_2 -> N_1 { label = "nsubj"}
N_2 -> N_3 { label = "xcomp"}
N_3 -> N_4 { label = "xcomp"}
N_3 -> N_1 { label="nsubj"; bottom; color=blue; forecolor=blue }
N_4 -> N_1 { label="nsubj"; bottom; color=blue; forecolor=blue }
}
......@@ -3,7 +3,7 @@
N_0 { word="(c)"; }
N_1 { word="ceux"; subword="those" }
N_2 { word="venus"; subword="come-PASTPART-pl"}
N_3 { word="vister"; subword="to_visit"}
N_3 { word="visiter"; subword="to_visit"}
N_4 { word="le"; subword="the"}
N_5 { word="musée"; subword="museum"}
}
......
......@@ -4,7 +4,7 @@ Syntactic alternations (like passive) are known to cause diversity in the observ
%The observed linking patterns in corpora, namely the grammatical functions born by the semantic arguments of a verb, are more varied
%While the semantic arguments of a given verb do generally have one ``canonical'' syntactic behavior, summarized as one grammatical function,
%syntactic alternations (like passive) come to break this uniformity. This obviously obfuscate the
Some at least of the existing syntactic alternations are very general and can be identified purely on syntactic grounds, without resorting to semantic disambiguation.
At least some of the existing syntactic alternations are very general and can be identified purely on syntactic grounds, without resorting to semantic disambiguation.
In this work, we advocate for neutralizing such variation in an ``enhanced-alt UD'' representation (enhanced UD representation augmented with syntactic alternation neutralization). Following \cite{candito:deepsequoia:2014,perrier:2014:taln}, we propose to distinguish {\em canonical} versus {\em final} grammatical functions, and to normalize syntactically alternated verb instances by making explicit the canonical grammatical functions of their arguments. The objective is to cluster observed subcategorization frames into possibly one canonical frame, with thus one linking pattern between canonical functions and semantic arguments.
We handle the French syntactic alternations for which morpho-syntactic clues are available, namely passive, medio-passive and causative\footnote{Impersonal constructions can also be viewed as syntactic alternations {\em Il arrive trois personnes (It arrives 3 people)}. But its representation in current UD already makes explicit that the postverbal dependent corresponds to the subject.}. We detail these below, identifying for each what is feasible using morpho-syntactic and lexical clues only, and what requires semantic information.
......@@ -71,7 +71,7 @@ Passive is by far the most frequent syntactic alternation, and it is fortunately
%% \label{ex:passive_inf}
Although passive is identified unambiguously, correctly identifying the argument that is subject in the active form (the ``by-phrase'' in English) is more problematic given the UD scheme. In French, it is introduced by a PP with preposition {\em par} (fig. \ref{fig:passive}) or for certain verbs, with preposition {\em de}%(ex. \ref{ex:passive_clause})
. But both prepositions can also introduce adjuncts, and the current French version of UD scheme uses the same label \func{obl} in both cases, leading to an ambiguity concerning the argumental status of the PP. In the following, we use a more specific \func{obl:agent} label for the {\em by}-phrases, as is done e.g. in the UD versions of the par-Tut parallel treebank \cite{partut:2014} (for English, French and Italian). We detail in section \ref{sec:evaluation} how we can obtain this labeling for the other French UD treebanks.
. But both prepositions can also introduce adjuncts, and the current French version of UD scheme uses the same label \func{obl} in both cases, leading to an ambiguity concerning the argumental status of the PP. In the following, we use a more specific \func{obl:agent} label for the {\em by}-phrases, as is done e.g. in the UD versions of the par-TUT parallel treebank \cite{partut:2014} (for English, French and Italian). We detail in section~\ref{sec:evaluation} how we can obtain this labeling for the other French UD treebanks.
%{\color{red}{ TODO here:evaluer quantitativement la proportion agent / ajout}}
......@@ -298,9 +298,9 @@ For instance, let's consider first the so-called ``subject control verbs'' (e.g.
\begin{figure}[h]
%\centering
\includegraphics[scale=0.35]{dep2pict/voulant_a}
\includegraphics[scale=0.35]{dep2pict/voulant_b}
\includegraphics[scale=0.35]{dep2pict/voulant_c}
\includegraphics[scale=0.32]{dep2pict/voulant_a}
\includegraphics[scale=0.32]{dep2pict/voulant_b}
\includegraphics[scale=0.32]{dep2pict/voulant_c}
\vspace{-15pt}\caption{Subject-control verbs (necessarily active): their canonical subject is the final subject of the infinitive.}
\label{fig:subjectcontrolverbs}
\end{figure}
......@@ -308,7 +308,7 @@ For instance, let's consider first the so-called ``subject control verbs'' (e.g.
For ``object control verbs'', the controller (final subject of the infinitive) is their canonical object. This holds both for active (fig. \ref{fig:subjectcontrolverbs} (a)) or passive object control verbs (fig. \ref{fig:subjectcontrolverbs} (b)). For instance in \ref{fig:objectcontrolverbs}(b), {\em forcer (to force)} is passive, the controller (({\em ceux (those)}) is always its canonical object, but shows as its final subject.
For ``object control verbs'', the controller (final subject of the infinitive) is their canonical object. This holds both for active (fig. \ref{fig:objectcontrolverbs} (a)) or passive object control verbs (fig. \ref{fig:objectcontrolverbs} (b)). For instance in \ref{fig:objectcontrolverbs}(b), {\em forcer (to force)} is passive, the controller ({\em ceux (those)}) is always its canonical object, but shows as its final subject.
% \begin{figure}
% {\color{red}{(arbres à faire)}}\\
......
......@@ -3,7 +3,7 @@
\iffalse
\blue{Djamé: il y a une figure page 3, en haut ni ref il me semble ans
caption. pareil la figure 1 est reférence au tout début, avant
l'exemple 1 mais ne correspond pas à l'exemple.
l'exemple 1 mais ne correspond pas à l'exemple.
Bon à demain, je suis desssus toute la journée (modulo talks).}
\fi
......@@ -27,13 +27,21 @@ In our implementation for French, we cope with all the above phenomena except nu
%remonté en intro Yet, most UD 2.0 treebanks, including French, do not contain any enhanced dependencies\footnote{\red{TODO here? list the languages containing substantial amount of enhanced dependencies}}. In this work we describe a rule-based approach to provide enhanced dependencies for French. We cope for now with all the above phenomena except empty nodes in case of ellipsis.
%\red{marie: à voir où caser cela:}
%\paragraph{Status of enhanced dependencies}
Note that while enhanced dependencies (as were Stanford dependencies) are motivated by downstream semantically-oriented applications, in their current stage they remain syntactic in nature in their current stage. This results in keeping syntactic dependents known for classic cases of syntax/semantics mismatch. So for instance, subjects of raising verbs are not removed from the enhanced UD graph, although they are not a semantic argument of the raising verb, as shown in Figure \ref{fig:montee}.
Note that while enhanced dependencies (as were Stanford dependencies) are motivated by downstream semantically-oriented applications, they remain syntactic in nature in their current stage. This results in keeping syntactic dependents known for classic cases of syntax/semantics mismatch. So for instance, subjects of raising verbs are not removed from the enhanced UD graph, although they are not a semantic argument of the raising verb, as shown in Figure \ref{fig:montee}.
%We follow this logic in our treatment of raising verbs, and in the proposed extensions, in particular for tough movement (section \ref{sec:nonfinite}).
\exg. Ils semblent vouloir partir.\\
They seem to-want to-leave.\\
% \exg. Ils semblent vouloir partir.\\
% They seem to-want to-leave.\\
% \label{fig:montee}
\begin{figure}[h]
\centering
\includegraphics[scale=0.30]{dep2pict/raising}
\vspace{-20pt}\caption{\emph{Raising verb}}
\label{fig:montee}
\end{figure}
Following the work of \draftremove{ Candito \emph{et al.}
\shortcite{candito:deepsequoia:2014} and Perrier \emph{et al.}
\shortcite{perrier:2014:taln} - les gars faut que je vous parle de
......@@ -43,4 +51,4 @@ detail in the next two sections: the first one is to extend the cases
for which arguments are added to infinitive verbs and more generally
to non finite verbs. The second one concerns the neutralisation of
syntactic alternations.
\section{Producing enhanced graphs for French UD treebanks}
\label{sec:evaluation}
We have experimented the proposed enhanced scheme on two French corpora of the UD project: \udf and \uds.
\udf is in the \ud projet since the version 1.0 (January 2015); data are taken from the Google dataset (see~\cite{mcdonald2013universal}) where annotations where verified by one annotator.
\udf is in the \ud projet since the version 1.0 (January 2015); data are taken from the Google dataset~\cite{mcdonald2013universal} where annotations where verified by one annotator.
It was later converted into a \ud version which has not been manually corrected systematically.
Nevertheless, the data were corrected and enriched in later versions.
\uds is part of the \ud project since version 2.0 (March 2017).
......@@ -27,13 +27,13 @@ For evaluating the rule-based systems, we produced a reference evaluation corpus
%The \draftreplace{gold annotation was done}{reference evaluation data was obtained} in three steps:
(1) application of the two rule-based systems on the gold UD trees,
(2) manual adjudication of the two ouputs and
(3) systematic check of infinitive verbs, past or present participles and coordination.
(3) systematic check of infinitive verbs, past or present participles and coordinations.
In the reference\draftremove{ EVAL} data, new edges represent 5.74\% of the total number of edges in UD-100 and 5.00\% in SEQ-100.
If we consider arguments of verbs only, new edges represent 16.46\% in UD-100 and 18.74\% in SEQ-100.
In the reference\draftremove{ EVAL} data, new edges represent 5.74\% of the total number of edges in UD-EVAL and 5.00\% in SEQ-EVAL.
If we consider arguments of verbs only, new edges represent 16.46\% in UD-EVAL and 18.74\% in SEQ-EVAL.
The number of edges with a \draftreplace{diathesis}{syntactic} alternation\draftadd{ (namely with a canonical function different from the final grammatical function)} are 1.95\% of the total number of edges in UD-100 and 2.61\% in SEQ-100.
Again, if we consider arguments of verbs only, \draftreplace{new}{these} edges represent 10.00\% in UD-100 and 14.29\% in SEQ-100.
The number of edges with a \draftreplace{diathesis}{syntactic} alternation\draftadd{ (namely with a canonical function different from the final grammatical function)} are 1.95\% of the total number of edges in UD-EVAL and 2.61\% in SEQ-EVAL.
Again, if we consider arguments of verbs only, \draftreplace{new}{these} edges represent 10.00\% in UD-EVAL and 14.29\% in SEQ-EVAL.
\subsection{Results}
% Given the small performance differences between them, However, for space reasons we will only describe the system YYY.
......@@ -65,7 +65,7 @@ Moreover, even if manual annotations are required, we observe that they concern
\centering
\begin{tabular}{|c|c|c|c|}
\hline
\multicolumn{2}{|c|}{SEQ-100} & \multicolumn{2}{|c|}{UD-100} \\ \hline
\multicolumn{2}{|c|}{SEQ-EVAL} & \multicolumn{2}{|c|}{UD-EVAL} \\ \hline
BASE & MAN & BASE & MAN \\ \hline
0.900 & 0.912 & 0.919 & 0.926 \\ \hline
\end{tabular}
......
......@@ -2,7 +2,7 @@
\label{sec:nonfinite}
The aim of enhancing UD dependencies is to facilitate the computation of predicate-argument relations at the semantic level.
In this perspective, \draftreplace{there is no reason to limit the recovering of arguments to raised and controlled subjects.
%\draftnote{un peu abrupt cf. il y a une raison pour le cas non automatisable}
%\draftnote{un peu abrupt cf. il y a une raison pour le cas non automatisable}
We consider here other cases not handled in Enhanced UD in which it is possible to recover subjects and also objects of non finite verbs, when they are present in the sentence.There are two possible situations: the syntax is sufficient to retrieve the relevant arguments and we can use automated procedures, or semantics and world knowledge are required and the task must be done manually. For each situation, we present the most common cases found in corpus.}{we propose to go beyond the explicitation of control and raising verbs subjects. We detail below other cases of obligatory syntactic control, and cases which are not as systematic but which prove feasible with rather high accuracy using heuristics.}
%\subsection{Cases determined by the syntax}
......@@ -46,7 +46,7 @@ In both cases, the subject of the infinitive is the nominal argument.
\paragraph{``Control adjectives''}
\draftreplace{In the same way that we have control verbs, we have control adjectives.
They take an infinitive as complement and they control the subject of the infinitive,
which}{Control adjectives take an infinitive complement, whose understood subject}
which}{Control adjectives take an infinitive complement, whose understood subject}
is the noun to which the adjective applies, as shown in Fig.~\ref{fig:control-adj}.
%Examples of Fig.~\ref{fig:control-adj} illustrate this phenomenon.
......@@ -157,7 +157,7 @@ as the examples of Fig.~\ref{fig:adj-inf} show\footnote{Note that the correspond
%--------------------------------------------------------------------------------------------
\subsection{Cases requiring semantic or world knowledge}
\paragraph{Dislocated participle clauses:}
\paragraph{Dislocated participle clauses:}
%Another situation where subjects of participles can be found automatically is the ``dislocation'' of participles at the beginning or at the end of the sentence.
%The subject of a dislocated participle is always the subject of the main sentence (Fig.~\ref{fig:dislocation}).
A participle clause modifying a noun can appear ``dislocated'' at the beginning or end of the sentence. In that case, its subject is most often the subject of the participle, although exceptions can be built\footnote{As in {\em Exténués, on les a envoyés dormir. (Exhausted, we them have sent to-sleep) ``Exhausted, they were sent to bed'')}.}.
......@@ -185,14 +185,14 @@ it takes {a lot} of work to finish on time\\
\draftreplace{By systematically choosing the subject of the main verb as the subject
of the infinitive, we produce certain errors and it is interesting to
of the infinitive, we produce some errors and it is interesting to
measure their relative importance in corpus. In }{We performed an in-depth study of these cases, using} the deep Sequoia
corpus %annotated with deep and surface dependencies
\draftreplace{\footnote{\url{http://talc2.loria.fr/deep-sequoia/deep-sequoia-1.1/}}}{\cite{candito:deepsequoia:2014}},
in which
in which
all subjects of infinitive verbs present in the sentence are marked.
\draftreplace{There are 143 infinitive heads of adverbial clauses. If we study the
143 cases }{Breaking down the 143 infinitive heads of adverbial clauses }
143 cases }{Breaking down the 143 infinitive heads of adverbial clauses }
according to the voice of the main verb, we obtain the following results:
\begin{itemize}[noitemsep]
\item\emph{main verb in the active voice:} there are 114 cases and among them, the subject of the infinitive is the subject of the main verb in 95 cases; in the 16 remaining cases, the subject of the infinitive is absent of the sentence;
......@@ -201,7 +201,7 @@ according to the voice of the main verb, we obtain the following results:
\end{itemize}
A heuristic that triggers the sharing for active main verbs only will obtain a $90\%$ recall and $83\%$ precision only.
%By systematically sharing the subject of the infinitive with the subject of the main clause, we produce 26\% of errors, which is not negligible.
%The conclusion is that when the main verb is not in active voice,
%The conclusion is that when the main verb is not in active voice,
There is a similar construction where a present participle introduced with a preposition (\textit{en} in French and \textit{by} in English) plays the role of a modifier for a main verb. The subject of the participle is \draftadd{generally} the subject of the main verb in most cases but \draftreplace{there are some exceptions. The following example illustrates an exception.}{again, this does not hold if the main verb is in passive voice (or is a modal introducing a passive, as shown in ex.~\ref{ex:part-subj3}}.
......@@ -210,8 +210,8 @@ This drug should be taken by eating\\
``This drug should be taken while eating''
\label{ex:part-subj3}
%The subject of \textit{mangeant} is not expressed in the sentence.
In Sequoia, there are 39 such constructions\draftremove{ with an adverbial clause introduced with the preposition \emph{en} and headed by a present participle}.
%The subject of \textit{mangeant} is not expressed in the sentence.
In Sequoia, there are 39 such constructions\draftremove{ with an adverbial clause introduced with the preposition \emph{en} and headed by a present participle}.
For all the 30 cases in which the main verb is in active voice, the subject of the main verb is understood as the subject of the participle. For the 9 cases in which the main verb is passive, for 8 of them the subject of the participle is not present in the sentence.
%marie: j'ai reformulé en insistant sur le fait que tous les cas de v à l'actif declenchent le partage. Among them, there are 8 cases \draftreplace{in which the subject of the participle is not the subject of the main clause}{for which the subject of the participle is not present in the sentence}. For 6 of them, the main verb is in the passive voice (see ex. \ref{ex:part-subj3}). In the 31 cases, for which the subject of the infinitive is the subject of the main verb, all verbs are in the active voice except one case.
Therefore, an automatic procedure taking into account the voice of the main verb should produce only a very small number of errors.
......
......@@ -63,7 +63,7 @@ multilingual parsing models \cite{ammar2016:many:tacl}.
gapping) are allowed --} its {\em collapsed}
representation\footnote{i.e. the one showing maximum differences to surface syntax.} has
only very recently started to be extended, and implemented to the
\ud scheme family \cite{schuster:2016:lrec}. The current version of universal dependencies guidelines (v2.0) includes a ``enhanced dependencies'' section\footnote{\scriptsize{\url{http://universaldependencies.org/u/overview/enhanced-syntax.html}}}, leaving the possibility for UD treebanks 2.0 to include all or only some phenomena (listed in section \ref{sec:enhanced}, that make explicit additional predicate-argument dependencies. Yet, in practice, most UD 2.0 treebanks contain either very few or no enhanced dependencies at all\footnote{Notable exceptions are the treebanks for Russian and Finnish.}. Further, the enhanced dependencies' description is currently still very preliminary.
\ud scheme family \cite{schuster:2016:lrec}. The current version of universal dependencies guidelines (v2.0) includes a ``enhanced dependencies'' section\footnote{\scriptsize{\url{http://universaldependencies.org/u/overview/enhanced-syntax.html}}}, leaving the possibility for UD treebanks 2.0 to include all or only some phenomena (listed in section \ref{sec:enhanced}), that make explicit additional predicate-argument dependencies. Yet, in practice, most UD 2.0 treebanks contain either very few or no enhanced dependencies at all\footnote{Notable exceptions are the treebanks for Russian and Finnish.}. Further, the enhanced dependencies' description is currently still very preliminary.
In this paper, we build on the work of
......@@ -83,7 +83,7 @@ We believe that when such a neutralization is feasible using basic basic morpho-
predicates' canonical subcategorization frames. These new canonical
realization are expressed through additional dependency edges and are purely
syntactic}{make explicit canonical grammatical functions that are still syntactic in nature}, unlike what can be found for example in the
techno-grammatical layer of the Prague Dependency bank
tecto-grammatical layer of the Prague Dependency bank
\cite{hajic2006prague}\draftadd{, and that we prove obtainable using morpho-syntactic and lexical clues only, without resorting to semantic disambiguation}. \draftremove{{\color{green}{mc:effectivement phrase suivante pas forcement utile:}}}\draftremove{This syntax-centered view enables many
possible extensions towards the building of syntax to semantic
interfaces}.\draftnote{bof. j'ai essayé de reprendre la phrase de Guy
......@@ -115,7 +115,7 @@ available to foster further work
In the following, we first briefly introduce the current Enhanced UD
scheme, we detail extensions concerning arguments of non finite verbs in section \ref{sec:nonfinite}
and syntactic alternations fro French in section \ref{sec:alternations}. We present and evaluate a system to obtain
and syntactic alternations for French in section \ref{sec:alternations}. We present and evaluate a system to obtain
enhanced graphs for French in section \ref{sec:evaluation}. We then discuss related work and conclude.
%% In this paper, we first briefly introduce the current Enhanced UD
%% scheme and then expose how it can be extended toward a more abstract and
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment