Compare revisions

BRAUD Chloé · BRAUD Chloé · BRAUD Chloé · BRAUD Chloé · BRAUD Chloé · BRAUD Chloé
--- a/discourse-parsing.html
+++ b/discourse-parsing.html
-<!DOCTYPE html>
-<html>
-  <head>
-    <meta charset="utf-8" />
-    <meta name="keywords" content="remark,remarkjs,markdown,slideshow,presentation" />
-    <meta name="description" content="A simple, in-browser, markdown-driven slideshow tool." />
-    <title>DiscourseParsing</title>
-    <style>
-      /* modified to point to our local separate files */
-      @import url("common/fonts.css");
-      @import url("common/style.css");
-    </style>
-  </head>
-  <body>
-    <textarea id="source">
-class: center, middle
-background-image:url(images/data-background-light.jpg)
-# Discourse parsing
-<br>
-### Discourse analysis - Discourse corpora - Discourse parsers
-https://gitlab.com/cbraud/coursediscourse
-<br><br>
-#### Nancy, 16 January 2020
-<br><br>
-.pull-left[<img src="images/logo-ul.png" width="50%"/>]
-.pull-right[.bold[Chloé Braud, [chloe.braud@irit.fr](mailto:chloe.braud@irit.fr),
-CNRS-IRIT]]
-<br><br>
---
-class: middle
-# Discourse analysis
-*Whenever we read something closely, with even a bit of sensitivity, text
-structure leaps off the page at us. We begin to see elaborations, explanations,
-parallelisms, contrasts, temporal sequencing, and so on. These relations bind
-contiguous segments of text into a global structure for the text as a whole.*
-(Hobbs, 1985)
---
-# Discourse processing
-### What is discourse?
-  * Document-level analysis
-  * Coherence and cohesion
-  * Discourse analysis
-  * Theoretical frameworks
-### Discourse parsing
-  * Discourse corpora
-  * Discourse parsing and discourse chunking
-  * Applications
-  * Current challenge
-### Practical session
-  * Explicit vs implicit discourse relations
---
-.left-column[
-## What is discourse?
-  ### - Document-level
-]
-.right-column[
-  ### Multi sentence linguistic phenomena
-  * Topics (topic segmentation),
-  * Temporal links,
-  * Entities and reference,
-  * Rhetorical/discourse relations
-Discourse structure is about:
-  * revealing text coherence,
-  * interpreting documents (i.e. making inferences on its content),
-There are links between the different kind of text organization, e.g.:
-  * constraints discourse/temporal
-    * e.g. often the effect after the cause
-  * discourse/topic
-    * e.g. some relations require to keep the same topic
-  * discourse/coreference
-    * e.g. some relations block a potential referent
-]
---
-.left-column[
-## What is discourse?
-  ### - Document-level
-]
-.right-column[
-  ### Multi sentence linguistic phenomena
-  * Topics (topic segmentation),
-  * Temporal links,
-  * Entities and reference,
-  * .alert[Rhetorical/discourse relations]
-Discourse structure is about:
-  * revealing text coherence,
-  * interpreting documents (i.e. making inferences on its content),
-There are links between the different kind of text organization, e.g.:
-  * constraints discourse/temporal
-    * e.g. often the effect after the cause
-  * discourse/topic
-    * e.g. some relations require to keep the same topic
-  * discourse/coreference
-    * e.g. some relations block a potential referent
-]
---
-.left-column[
-## What is discourse?
-  ### - Document-level
-  ### - Coherence
-]
-.right-column[
-## Coherence and cohesion
-Document: not a random sequence of sentences
-* A text is **cohesive** if its elements are linked together (non structural
-textual relations)
-* A text is **coherent** if it makes sense (structural relation between segments).
-$\rightarrow$ Document = coherent structured group of sentences
-Simple examples:
-.small[
-* *Paul fell. Marie helped him get up.* (Narration)
-* *Paul fell. Marie pushed him.* (Cause)
-* *Paul fell. He likes spinach.* (??)
-* *Paul went to Istanbul. He likes travelling.* (Explanation)
-* *Paul went to Istanbul. He likes spinach.* (? Explanation)
-]
-People tend to "force" a meaningful relation between sentences!
-]
---
-.left-column[
-## What is discourse?
-  ### - Document-level
-  ### - Coherence
-]
-.right-column[
-#### Cohesive but not coherent text:
-i.e.: Each sentence is notionally linked to the one that precedes it, using both
-lexical and grammatical means, but the text is ultimately senseless
-.small[*I am a teacher. The teacher was late for class. Class rhymes with grass. The
-grass is always greener on the other side of the fence. But it wasn't.* (Teacher
-resource site)]
-#### Automatically generated summary: not cohesive, not coherent
-i.e.: improper sentence ordering, pronoun without antecedent
-.small[*It’s like going to disney world for car buyers. I have to say that Carmax rocks.
-We bought .alert[it] at Carmax, and I continue to have nothing bad to say about
-that company. After our last big car milestone, we’ve had an odyssey with cars.*
-[Mithun and Kosseim, 2011]]
-#### The task of modeling coherence / cohesion:
-* "discern an original text from a permuted ordering of its sentences"
-* "locate the original position of a sentence previously removed "
-* "compare the rankings, given by the model, against human judgments"
-* goal: e.g., coherence scoring (automatically evaluating student essays,
-L2 learners productions, ...), or improving text generation, summarization
-]
---
-background-image: url(images/conn-lex.jpg)
-background-size: 380px
-background-position: 5% 90%
-.left-column[
-## What is discourse?
-### - Document-level
-### - Coherence
-### - Discourse analysis
-]
-.right-column[
-### Discourse Relations
-The existence of discourse relations is hinted by discourse connectives, such
-as *however*, *moreover*, *meanwhile*, *if..then*...
-These connectives:
-* contribute to cohesion and coherence
-* explicitly specify the relation between adjacent units of text:
-  * *however* signals a contrastive relation
-  * *moreover* signals that the subsequent text elaborates or strengthens
-  the point that was made immediately beforehand,
-  * *meanwhile* indicates that two events are contemporaneous
-  * *if...then* sets up a conditional relationship.
-**Connective lexicons:** [connective-lex](http://connective-lex.info/)
-]
---
-.left-column[
-## What is discourse?
-### - Document-level
-### - Coherence
-### - Discourse analysis
-]
-.right-column[
-### Discourse Relations link:
-* the semantic contents of two units (1)
-* or the speech act expressed and the semantic content (2)
-* same phenomena within sentences (3)
-Examples:
-.small[
-* This cute child turns out to be a blessing and a curse.
-*She gives the Artist a sense of purpose, but also alerts him to the serious
-inadequacy of his vagrant life.* **(Cause-reason)**
-]
-.small[
-* Mrs. Yeargin is lying. *They found students (..) who said she gave them
-similar help.* **(Pragmatic Cause-justification)**
-]
-.small[
-* Typically, money-fund yields beat comparable short-term investments
-*because portfolio managers can vary maturities and go after highest rates.*
-**(Cause-reason)**
-]
-]
---
-.left-column[
-## What is discourse?
-### - Document-level
-### - Coherence
-### - Discourse analysis
-]
-.right-column[
-### Partial consensus:
-* Analysis unit: a document
-* Elementary Discourse Unit (EDU): mostly clauses, at most a sentence
-* Discourse relations: semantico-pragmatic, binary, inter- and intra-sentential
-    * Explicit: with a discourse connective
-    * Implicit: without a discourse connective
-Examples:
-.small[
-* The tawny owl is a nocturnal bird of prey, .alert[**but**]
-it can live in the daytime.
-]
-.small[
-* The towers collapsed less than two hours later .alert[**(Result)**] dragging
-down with them the building of the Marriott World Trade Center.
-.alert[**(Sequence)**] The tower 7 of the WTC collapsed in the afternoon
-.alert[**because**] of damages caused by the fall of Twin Towers.
-]
-]
---
-.left-column[
-## What is discourse?
-### - Document-level
-### - Coherence
-### - Discourse analysis
-### - Theoretical frameworks
-]
-.right-column[
-### Frameworks/annotation schemes: 2 views
-#### Hierarchical discourse structure (RST, SDRT, DLTAG, GraphBank...)
-* Structure: trees/graphs covering the documents
-* Try to give an interpretation to the document
-* Annotation is hard!
-#### Local coherence (PDTB)
-* "theory neutral": lexically grounded
-* Flat structure, no full covering
-* Higher inter-annotator agreement (larger corpora)
-]
---
-.left-column[
-## What is discourse?
-### - Document-level
-### - Coherence
-### - Discourse analysis
-### - Theoretical frameworks
-]
-.right-column[
-### Different frameworks/annotation schemes
-#### Rhetorical Structure Theory DT [carlson et al., 2001]
-* Relations: definition based on the author intentions
-* 78 relations, 16 classes
-* One relation per pair of units
-* Structure: trees covering the documents
-#### Penn Discourse TreeBank (PDTB) [Prasad et al., 2008]
-* Annotation based on connectives and adjacency
-* Hierarchy: 4 classes, 16 types, 23 subtypes
-* Possibly multiple relations
-* Flat/no structure: no relation between some units
-#### SDRT: Annodis [Afantenos et al. 2012]
-* Relation definitions based on semantics
-* 18 relations
-* One relation per pair of units, embedded units
-* Structure: graphs covering the documents
-]
---
-.left-column[
-## What is discourse?
-### - Document-level
-### - Coherence
-### - Discourse analysis
-### - Theoretical frameworks
-]
-.right-column[
-#### RST
-<img src="images/rst3.png" width="75%"/>
-.west[
-#### SDRT
-<img src="images/sdrt-salmon.png" width="95%"/>
-]
-.east[
-#### PDTB
-<img src="images/pdtb.jpg" width="60%"/>
-]
-]
---
-.left-column[
-## What is discourse?
-### - Document-level
-### - Coherence
-### - Discourse analysis
-### - Theoretical frameworks
-]
-.right-column[
-*Strong generative capacity of RST, SDRT and discourse dependency DAGSs*,
-Danlos,  Constraints in Discourse, 2008
-<img src="images/comparison.png" width="65%"/>
-]
---
-# Discourse parsing
-<img src="images/rst42-4.png" width="20%"/>
-<img src="images/rst42-3.png" width="70%"/>
-<img src="images/rst42-2.png" width="100%"/>
-<img src="images/rst42-1.png" width="100%"/>
---
-.left-column[
-## Discourse parsing
-### - Corpora
-##### RST DT
-]
-.right-column[
-### The RST Discourse TreeBank
-[RST website](https://www.sfu.ca/rst/)
-* Annotated upon the PTB
-* 385 documents
-* RST analysis goal: recovering the author’s "intentions"
-* Typically one “more important” segment (nucleus vs satellite)
-* Most used for discourse parsing because:
-  * **Trees!**: similar to syntactic parsing
-  * full coverage of documents: all parts are connected
-Honnestly: old, weird corpus...
-  * only 1 relation per pair of segments
-  * very strange relations: *attribution*, *same-unit*...
-  * very strange segmentation: see below
-  * the definition of the relations is very hard to understand:
-  [take a look](https://www.sfu.ca/rst/01intro/definitions.html)
-.small[*|Mr. Volk, 55 years old, succeeds Duncan Dwight,| |who retired in September.|*]
-.small[*|The Tass news agency said the 1990 budget anticipates income of 429.9 billion rubles| |($US693.4 billion)| *]
-]
---
-.left-column[
-## Discourse parsing
-### - Corpora
-##### RST DT
-]
-.right-column[
-### Relation set (classes and relations)
-* **Attribution**: attribution, attribution-negative
-* **Background**: background, circumstance
-* **Cause**: cause, result, consequence
-* **Comparison**: comparison, preference, analogy, proportion
-* **Condition**: condition, hypothetical, contingency, otherwise
-* **Contrast**: contrast, concession, antithesis
-* **Elaboration**: elaboration-additional, elaboration-general-specific, elaboration-part-whole,
-elaboration-process-step, elaboration-object-attribute, elaboration-set-member, example, definition
-* **Enablement**: purpose, enablement
-* **Evaluation**: evaluation, interpretation, conclusion, comment
-* **Explanation**: evidence, explanation-argumentative, reason
-* **Joint**: list, disjunction
-* **Manner-Means**: manner, means
-* **Topic-Comment**: problem-solution, question-answer, statement-response, topic-comment,
-comment-topic, rhetorical-question
-* **Summary**: summary, restatement
-* **Temporal**: temporal-before, temporal-after, temporal-same-time, sequence, invertedsequence
-* **Topic Change**: topic-shift, topic-drift
-]
---
-.left-column[
-## Discourse parsing
-### - Corpora
-##### PDTB
-]
-.right-column[
-### The Penn Discourse Treebank
-[PDTB Website](https://www.seas.upenn.edu/~pdtb/)
-* Wall Street Journal Articles
-* Annotated upon the Penn Treebank
-* 2,259 documents
-#### Annotation
-* Explicit relations: 18,459
-    * Connectives: closed-list of 100
-    * Arguments = minimal text spans
-* Implicit relations: 16,224
-* Alternative lexicalizations: 624
-* Entity Relations: 5,210
-* No Relation: 254
-* **40,600** annotations
-]
---
-.left-column[
-## Discourse parsing
-### - Corpora
-##### PDTB
-]
-.right-column[
-### Explicit discourse relations
-* Annotation: the connective is annotated, the relation(s) triggered (up to 2)
-and the arguments
-* Different types of connectives (and positions):
-  * Subordinating conjunctions (e.g., because, when, since, although)
-.small[
-|Use of dispersants was approved|1 **when** |a test on the third day showed some positive
-results|2, officials said. (CONTINGENCY:Cause:reason)
-**Although** |the purchasing managers’ index continues to indicate a slowing economy,|2
-|it isn’t signaling an imminent recession|1, said Robert Bretz  (COMPARISON:Concession:expectation)
-]
-  * Coordinating conjunctions (e.g., and, or, nor):
-.small[
-The theory is |that Seymour is the chief designer of the Cray-3,| **and**
-|without him it could not be completed.|2
-(EXPANSION.Conjunction and CONTINGENCY.Cause.result)
-]
-  * (ADVP and PP) adverbials (e.g., however, otherwise, then, as a result, for example)
-.small[
-A Chemical spokeswoman said |the second-quarter charge was "not material"|1 |and
-that no personnel changes were made|2 **as a result**. (CONTINGENCY:Cause:result)
-]
-]
---
-.left-column[
-## Discourse parsing
-### - Corpora
-##### PDTB
-]
-.right-column[
-### Implicit discourse relations
-.small[
-|Mrs Yeargin is lying.|1 **Implicit = because** |They found students in an advanced class
-a year earlier who said she gave them similar help.|2 (CONTINGENCY:Pragmatic
-Cause:justification)
-]
-* Annotated between adjacent segments (only sentences in PDTB 2)
-* Up to 4 relations annotated per pair of segments
-* Rule: a connective can be inserted
-* and the connective is annotated
-* Introduce a bias?
-  * Adding a connective may hide a relation
-* Some papers: try to guess this connective and then the relation
-  * Require this specific annotation to be extended to other languages, genres...
-]
---
-.left-column[
-## Discourse parsing
-### - Corpora
-##### PDTB
-]
-.right-column[
-### Alternative lexicalizations
-* "cases where a discourse relation is inferred between adjacent sentences"
-* "but where providing an Implicit connective leads to redundancy in the
-expression of the relation"
-* "the relation is alternatively lexicalized by some “non-connective expression”
-.small[
-And she further stunned her listeners by revealing her secret garden design
-method: |Commissioning a friend to spend “five or six thousand dollars . . .
-on books that I ultimately cut up.”|1 **AltLex After that**,
-|the layout had been easy.|2
-The Bank of England, on the other hand, had gold reserves that averaged about
-30% of its outstanding currency (...)
-**AltLex The most likely reason for this disparity** is that (...)
-]
-]
---
-.left-column[
-## Discourse parsing
-### - Corpora
-##### PDTB
-]
-.right-column[
-### Relation set
-<img src="images/pdtb-rel.png" width="100%"/>
-]
---
-.left-column[
-## Discourse parsing
-### - Corpora
-]
-.right-column[
-* RST: many new projects for other languages, often simplifying the
-relation set
-* PDTB: many languages covered (but rarely full annotation, eg. French = only
-connectives)
-<img src="images/corpora.png" width="100%"/>
-]
---
-background-image: url(images/steps-nlp.jpg)
-background-size: 200px
-background-repeat: no-repeat
-background-position: center
-background-size: 100%
-class:inverse
-## Discourse parsing
-]
---
-.left-column[
-## Discourse parsing
-### - Corpora
-### - Discourse processing
-]
-.right-column[
-### Natural Language Processing:
-* many applications just focus on sentence processing (eg. see the problems
-previously shown for summarization)
-* often try to at least take co-reference into account
-* discourse information is "needed" (if needed) at the end of the pipeline
-* also means that discourse processing needs all the information from the
-previous steps (you can easily imagine the problem with error propagation ...)
-  * one thing that makes discourse parsing hard
-Next step: Pragmatic Analysis
-* *In this step, data is interpreted on what it actually meant. Although, we have
-to derive aspects of language which require real-world knowledge.*
-  * reference to objects in the world,
-  * but also deal with context, implicature, etc
-]
---
-.left-column[
-## Discourse parsing
-### - Corpora
-### - Discourse processing
-### - RST parsing
-]
-.right-column[
-### Discourse parsing: first step, segmenting
-* Segment a document into EDUs
-  * mostly clauses and sentences, but a bit more fine-grained in the RST DT
-  * see the large set of rules in the [tagging manual](https://www.isi.edu/~marcu/discourse/tagging-ref-manual.pdf), eg:
-.small[
-* Includes both speech acts and other cognitive acts:
-|The company says| |it will shut down its plant.|
-* But if the complement is a to-infinitival, do not segment:
-|The company wants to shut down its plant.|
-* But segment infinitive clause marking a purpose relation (but not all of them,
-would be too easy...):
-|A grand jury has been investigating
-whether officials (...) conspired **to** cover up their accounting| |**to**
-evade federal income taxes.|
-]
-]
---
-.left-column[
-## Discourse parsing
-### - Corpora
-### - Discourse processing
-### - RST parsing
-]
-.right-column[
-### Discourse segmenters
-Most existing systems use lexical, POS tags and syntactic information + gold
-sentence segmentation (not a so easy task!)
-* RST DT: [Xuan Bach et al.]: F1 91.0% (automatic parse) / 93.7% (gold parse)
-* English instructional corpus: [Joty et al, 2015] F1 80.9%
-### ToNy, winner of the last shared task :)
-See the results [here](https://sites.google.com/view/disrpt2019/shared-task?authuser=0),
-and the paper [here](https://hal.archives-ouvertes.fr/hal-02374091/file/21_Paper.pdf)
-[Muller et al, 2019]
-* Using contextual embeddings alone allows close to state-of-the-art results
-* ELMo better than BERT on English, but not multilingual
-* Results with BERT multilingual, average over the languages:
-  * F1 90.11% if sentence boundaries given
-  * F1 86.38% else
-* Problem with cross-domain learning:
-  * Training on GUM and testing on RST-DT: drop from 96% to 66%
-  * Training on RST-DT to test on GUM from 93% to 73%
-  * Note: [GUM corpus](http://corpling.uis.georgetown.edu/gum/) is composed of
-  documents from several domains.
-]
---
-.left-column[
-## Discourse parsing
-### - Corpora
-### - Discourse processing
-### - RST parsing
-]
-.right-column[
-### Discourse parsing: second step, building the tree
-* Attachment: which EDUs are linked together
-* Labeling: with which relation / sense
-* Recursive process: the pair of discourse units has to be linked to another
-segment, and so on, until full coverage
-* + RST bonus: label each segment as nucleus or satellite
-Parsers are inspired from syntactic parsing:
-* Transition based, shift-reduce, CKY parsing (constituency or dependency)
-* Main problems:
-  * Efficiency: trees are often far deeper than in syntax
-  * Representation: we need to encode spans of text instead of just words
-  * Relations are semantic, harder to identify than syntactic ones
-  * Lack of data: corpora are small, 385 documents in the RST DT, meaning
-  385 trees / instances for ou system
-]
---
-.left-column[
-## Discourse parsing
-### - Corpora
-### - Discourse processing
-### - RST parsing
-]
-.right-column[
-### Representing discourse units and their combination [Ji and Eisenstein, 2014](https://www.aclweb.org/anthology/P14-1002.pdf)
-* Idea: jointly learn the task and the word representation (as low dimensional
-vector)
-* Test 3 options for transforming the original features, taking into account
-relationships between adjacent EDUs
-### Overcoming the lack of data by splitting the task [Wang et al, 2017](https://www.aclweb.org/anthology/P17-2029.pdf)
-* Idea: not enough data for structure + nuclearity + relation.
-  * First: build a parser that identifies the naked structure + nuclearity
-  * Then: relations, 3 classifiers (within/across sentence, across paragraphs)
-[Morey et al, 2017](https://hal.archives-ouvertes.fr/hal-01650251/document): evaluation problem, scores in [Ji and Eisenstein, 2014] are
-not computed using the right evaluation metrics, F1=57.8% (and not 61.6%)
-<img src="images/rst-parsing.png" width="40%"/>
-]
---
-.left-column[
-## Discourse parsing
-### - Corpora
-### - Discourse processing
-### - RST parsing
-]
-.right-column[
-### What about other languages? [Braud et al., 2017](https://arxiv.org/pdf/1701.02946.pdf)
-* Cross-lingual experiments:
-  * Train only on data for other languages
-  * Train on data for other languages but optimize the hyper-parameters on data
-  for the target language
-* Transfer is very hard!
-* Monolingual experiments: large drop of performance for language other than
-english, ie. smaller corpora
-<img src="images/rst-cross.png" width="100%"/>
-]
---
-.left-column[
-## Discourse parsing
-### - Corpora
-### - Discourse processing
-### - RST parsing
-### - PDTB parsing
-]
-.right-column[
-### Discourse chunking or shallow discourse parsing
-* Identifying discourse connectives
-* Identifying connective arguments:
-  * position
-  * boundaries
-* Identifying the sense of the discourse relation (label)
-<img src="images/pdtb-pipeline2.png" width="80%"/>
-]
---
-.left-column[
-## Discourse parsing
-### - Corpora
-### - Discourse processing
-### - RST parsing
-### - PDTB parsing
-]
-.right-column[
-### Interesting subtasks
-#### Explicit discourse relations: Connective ambiguity
-* Usage ambiguity:
-  * discourse reading vs non discourse reading: whether or not a given token
-  is serving as a discourse connective in its context
-  * e.g. in 1. *since*: no discourse reading
-* Sense ambiguity:
-  * what discourse relation(s) a given token is signalling
-  * e.g. in 2. *since* signals a temporal relation while in 3 *since* signals
-  a cause
-.small[
-1. She has been up **since** 5am.
-2. There have been over 100 mergers **since** the most recent wave of friendly takeovers ended.
-3. It was a far safer deal **since** the company has a healthier cash flow
-]
- [Pitler and Nenkova, 2009]: syntactic features are useful for both tasks
- [Webber et al. 2019]: more on connective ambiguity
-]
---
-.left-column[
-## Discourse parsing
-### - Corpora
-### - Discourse processing
-### - RST parsing
-### - PDTB parsing
-]
-.right-column[
-### Interesting subtasks
-#### Identifying discourse connectives (usage)
-* Closed-list of known discourse connectives:
-  * *because, in the meanwhile, but, if..then, on the one hand..on the other hand...*
-* Disambiguation problem:
-  * *Paul likes dogs **and** cats* (no discourse reading)
-  * *Paul feeds the cat **and** pets the dog.* (discourse reading)
-* Binary classification
-* Syntactic and lexical features: connective, its POS-tag and
-immediate context, syntactic sisters and path to root
-* **High performance: around 95% in F1**
-]
---
-.left-column[
-## Discourse parsing
-### - Corpora
-### - Discourse processing
-### - RST parsing
-### - PDTB parsing
-]
-.right-column[
-### Interesting subtasks
-#### Identifying connective arguments
-* Relative position of Arg1 and Arg2:
-  * Classification: same/previous(/following) sentence
-  * Representation: position of the connective, context
-  * High performance: 97.94% in F1
-* Exact span:
-  * Select the nodes in the syntactic tree included
-  * **Moderate performance: 53.85-86-24% in F1**
-]
---
-.left-column[
-## Discourse parsing
-### - Corpora
-### - Discourse processing
-### - RST parsing
-### - PDTB parsing
-]
-.right-column[
-### Interesting subtasks
-#### Explicit sense classifier
-* Multiclass classification
-* Features: connective, connective postag, previous and following postag and
-word
-* **High performance: 86.77% in F1**
-But, be careful, that's the general picture:
-* performance drops when tokenization or pos tagging are not manual / gold
-* performance drops for other languages or domains
-* good performance for higly frequent connectives, or a few very unambiguous
-connectives
-]
---
-.left-column[
-## Discourse parsing
-### - Corpora
-### - Discourse processing
-### - RST parsing
-### - PDTB parsing
-]
-.right-column[
-See [Johannsen and Sogaard, 2013]
-<img src="images/conn-scores-sogaard.png" width="65%"/>
-]
---
-.left-column[
-## Discourse parsing
-### - Corpora
-### - Discourse processing
-### - RST parsing
-### - PDTB parsing
-]
-.right-column[
-### Interesting subtasks
-#### Implicit sense classifier
-* Multiclass classification
-* Features: word pairs, modality, semantic classes, polarity...
-* **Low performance: 42-57.1% in F1** (level 1-2 relations)
-* a very hard task (let's try during the practical!), yet not solved
-* but crucial: about 50% of the relations are implicit
-Many strategies tested:
-* Semi-supervision/domain adaptation:
-  * using explicit examples, automatically annotated data
-* Distant supervision:
-  * building a word representation tailored to the task
-  * multi-task learning using data from other discourse corpora, or data for
-  other tasks (temporality, co-reference, speech acts, ...)
-* Various algorithms:
-  * including all variations of neural networks
-]
---
-.left-column[
-## Discourse parsing
-### - Corpora
-### - Discourse processing
-### - RST parsing
-### - PDTB parsing
-]
-.right-column[
-### Explicit vs implicit relations
-**Results of the last shared task**: compare F-measure for Explicit vs Implicit
-<img src="images/results_explimpl.png" width="110%"/>
-]
---
-.left-column[
-## Discourse parsing
-### - Corpora
-### - Discourse processing
-### - RST parsing
-### - PDTB parsing
-]
-.right-column[
-### Full pipeline on the PDTB (except AltLex, EntRel, Norel, Attribution)
-**Results of the last shared task**: max 27.7 in F-measure!!
-<img src="images/results_conll_2016.png" width="100%"/>
-<img src="images/results_conll_2016_chinese.png" width="100%"/>
-]
---
-.left-column[
-## Discourse parsing
-### - Corpora
-### - Discourse processing
-### - RST parsing
-### - PDTB parsing
-### - Applications
-]
-.right-column[
-### Discourse structure: useful for several tasks and applications
-* Temporal ordering
-* Co-reference [Cristea et al., 1999]
-* Automatic summarization [Sporleder and Lapata, 2005]
-* Question Answering [Verberne, 2007]
-* Sentiment analysis [Bhatia et al., 2015]
-* Essay scoring [Higgins et al., 2004; Mesgar and Strube, 2018]
-* Summary coherence rating [Nguyen and Joty, 2017], coherence Modeling
-[Li and Jurafsky, 2017; Mesgar and Strube, 2018]
-* Readability assessment
-* Machine translation [Meyer and Webber, 2013; Born et al., 2017 ]
-.small[
-*Paul fell, Mary pushed him.* Explanation $\rightarrow$ pushed < fell
-*The champions league has become a source of income for clubs **since** it
-started in 1992.* Temporal
-*La ligue des champions est devenue une source de revenus pour les clubs
-**car** il a commencé en 1992.* Causal
-]
-]
---
-.left-column[
-## Discourse parsing
-### - Corpora
-### - Discourse processing
-### - RST parsing
-### - PDTB parsing
-### - Applications
-### - Current challenges
-]
-.right-column[
-### Room for improvement everywhere, but more importantly:
-* Evaluation methods:
-  * stop evaluating only on RST DT or PDTB (i.e. English + Wall Street Journal)
-  * be careful when evaluating, see [Morey et al. 2017]
-  * use downstream applications: is all this work really useful? Do we really
-  need (full) discourse parsers? We need to see that it helps for other tasks
-* Learning from a limited amount of data (especially with neural methods):
-  * we can't expect unlimited amount of annotated data, especially for discourse
-* Adapting to new domains or languages:
-  * even languages and domains without any annotated data [Braud et al. 2017]
-* Conversations / Dialogues: only a few corpora + annotation very hard
-  * automatically summarizing meetings
-  * improving chatbots
-  * detecting schizophrenia using patient-doctor conversations ;)
-  * useful for social media analysis: Twitter / forum are similar to dialogues
-]
---
-count: false
-## Sources and reading list
-.small[
-* Course on discourse parsing at ESSLLI 2019: https://github.com/TScheffler/2019ESSLLI-discparsing
-* J. eisenstein's NLP course: https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes-10-15-2018.pdf
-* Old slides: http://www.dfki.de/~horacek/09-Discourse-Parsing.pdf
-* ConLL shared task 2015: http://www.cs.brandeis.edu/~clp/conll15st/
-* ConLL shared task 2016: http://www.cs.brandeis.edu/~clp/conll16st/
-* C. Pott's course on discourse: http://compprag.christopherpotts.net/pdtb.html
-* ICDM 18 tutorial: https://drive.google.com/file/d/1XmaN6tXxnVasw8Sp0cr0FXbG7KHsEJlM/view
-and https://drive.google.com/file/d/1QcbkKGZI8BAZh3v0_36BKDHw-rpOLjVC/view
-* Discourse corpora:
-  * RST tagging manual: https://www.isi.edu/~marcu/discourse/tagging-ref-manual.pdf
-* Discourse processing:
-  * Disambiguating explicit discourse connectives without oracles, Johannsen and Sogaard, IJCNLP, 2013
-  https://www.aclweb.org/anthology/I13-1134.pdf
-* Applications:
-  * Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation,
-  Läubli et al., EMNLP, 2018, https://www.aclweb.org/anthology/D18-1512.pdf
-  * Modeling local coherence: An entity-based approach, Barzilez and Lapata, ACL 2005,
-  https://people.csail.mit.edu/regina/my_papers/coherence.pdf
-  * Automatically Evaluating Text Coherence Using Discourse Relations, Lin et all, ACL, 2011
-  * A Neural Local Coherence Model, Nguyen and Joty,  ACL 2017,
-  https://www.aclweb.org/anthology/P17-1121.pdf
-  * More references on coherence modeling in ICDM 18 slides
-* Steps in NLP (picture): https://data-flair.training/blogs/ai-natural-language-processing/
-]
-<section data-background-iframe="https://nbviewer.jupyter.org/urls/mastertal.gitlab.io/UE803/notebooks/Exercise_sheet_7.ipynb" data-background-interactive>
-</section>
-    </textarea>
-    <script src="common/remark-latest.min.js"></script>
-    <script>
-      var hljs = remark.highlighter.engine;
-    </script>
-    <script src="common/remark.language.js"></script>
-    <script src="common/mermaid/mermaid.min.js"></script>
-    <script src="common/katex/katex.min.js"></script>
-    <script src="common/katex/contrib/auto-render.min.js"></script>
-    <script src="common/terminal.language.js" type="text/javascript"></script>
-    <link rel="stylesheet" href="common/mermaid/mermaid.css">
-    <link rel="stylesheet" href="common/katex/katex.min.css">
-    <script>
-      var options = {
-        highlightStyle: 'monokai',
-        highlightLanguage: 'remark',
-        highlightLines: true,
-        // Set the slideshow display ratio
-        // Default: '4:3'
-        // Alternatives: '16:9', ...
-        ratio: '16:9',
-      };
-      var renderMath = function() {
-      //renderMathInElement(document.body);
-      // or if you want to use $...$ for math,
-      renderMathInElement(document.body, {delimiters: [ // mind the order of delimiters(!?)
-           {left: "$$", right: "$$", display: true},
-           {left: "$", right: "$", display: false},
-           {left: "\\[", right: "\\]", display: true},
-           {left: "\\(", right: "\\)", display: false},
-       ]});
-      }
-      var slideshow = remark.create(options, renderMath) ;
-      // don't let mermaid automatically load on start
-      mermaid.initialize({
-        startOnLoad: false,
-        cloneCssStyles: false
-      });
-      function initMermaidInSlide(slide) {
-        var slideIndex = slide.getSlideIndex();
-        // caution: no API to get the DOM element of current slide in remark,
-        // this might break in the future
-        var currentSlideElement = document.querySelectorAll(".remark-slides-area .remark-slide")[slideIndex];
-        var currentSlideMermaids = currentSlideElement.querySelectorAll(".mermaid");
-        if (currentSlideMermaids.length !== 0) {
-          mermaid.init(undefined, currentSlideMermaids);
-        }
-      }
-      // first starting slide won't trigger the slide event, manually
-      // init mermaid
-      initMermaidInSlide(slideshow.getSlides()[slideshow.getCurrentSlideIndex()]);
-      // on each slide event, trigger init mermaid
-      slideshow.on('afterShowSlide', initMermaidInSlide);
-      // extract the embedded styling from ansi spans
-      var highlighted = document.querySelectorAll("code.terminal span.hljs-ansi");
-      Array.prototype.forEach.call(highlighted, function(next) {
-        next.insertAdjacentHTML("beforebegin", next.textContent);
-        next.parentNode.removeChild(next);
-      });
-    </script>
-  </body>
-</html>
--- a/DiscourseParsing.pdf
+++ b/DiscourseParsing.pdf
--- a/discourse-parsing_part2.pdf
+++ b/discourse-parsing_part2.pdf
--- a/getSlides.sh
+++ b/getSlides.sh
-#! /bin/bash
-# contact: yannick.parmentier@loria.fr
-# date: 2019/07/08
-name=$1
-if [ $# -lt 1 ];
-then
-	echo "Usage :"
-	echo "        $0 <file_name>"
-	exit 1
-else
-	cp .slides-template.html $name
-	echo "Slides file created."
-fi
-exit 0
--- a/images/Imprimer la commande n°143819976.pdf
+++ b/images/Imprimer la commande n°143819976.pdf
--- a/images/UZWSUE_1501_BRAUD_1a1f03fc60102.pdf
+++ b/images/UZWSUE_1501_BRAUD_1a1f03fc60102.pdf
--- a/images/applications.png
+++ b/images/applications.png
--- a/images/argmicrotext.png
+++ b/images/argmicrotext.png
--- a/images/comparison.png
+++ b/images/comparison.png
--- a/images/comparison2.png
+++ b/images/comparison2.png
--- a/images/conn-lex.jpg
+++ b/images/conn-lex.jpg
--- a/images/conn-scores-sogaard.png
+++ b/images/conn-scores-sogaard.png
--- a/images/connectives.png
+++ b/images/connectives.png
--- a/images/corpora.png
+++ b/images/corpora.png
--- a/images/cross-seg.png
+++ b/images/cross-seg.png
--- a/images/data-background-light.jpg
+++ b/images/data-background-light.jpg
--- a/images/disrpt.jpeg
+++ b/images/disrpt.jpeg
--- a/images/gears3.jpg
+++ b/images/gears3.jpg
--- a/images/logo-cnrs.png
+++ b/images/logo-cnrs.png
--- a/images/logo-ul.png
+++ b/images/logo-ul.png
No results found