Commit 842ab50e authored by Bruno Guillaume's avatar Bruno Guillaume

update “install page”; new “grew grep” output

parent 74707ab9
......@@ -54,7 +54,7 @@ For intance, with the two following 1-line files:
* `ADJ_NOUN.pat` containing: `pattern { A[upos=ADJ]; N[upos=NOUN]; N -[amod]-> A; A << N }`
* `NOUN_ADJ.pat` containing: `pattern { A[upos=ADJ]; N[upos=NOUN]; N -[amod]-> A; N << A }`
The commands below computes the statistics about the number of occurrences of each pattern in each corpus:
The command below computes the statistics about the number of occurrences of each pattern in each corpus:
```
grew_daemon grep --patterns "ADJ_NOUN.pat NOUN_ADJ.pat" en_fr_zh.json
......
......@@ -6,19 +6,38 @@ title = "installation"
# Grew installation
**Grew** is implemented with the **[Ocaml](http://ocaml.org)** language.
It is easy to install on Linux or Mac OS&nbsp;X (installation on Windows should be possible, but this is untested).
It can be installed on Linux or Mac OS&nbsp;X (installation on Windows should be possible, but this is untested).
A Python binding is also available.
You will need to install:
1. `opam` which is the standard package manager for Ocaml
1. `ocaml` which can be installed by `opam`
1. `grew` which is available as an `opam` package
If you just need to upgrade your installation, please consult the [Upgrade page](../upgrade).
:warning: If you run into trouble using the instructions of this page, feel free to [open an issue on GitLab](https://gitlab.inria.fr/grew/grew_doc/issues) or to [contact the developer](mailto:Bruno.Guillaume@inria.fr?subject=Install%20of%20Grew).
## Step 1: Install prerequisite
## Step 1: Install opam
**Grew** requires **opam** version **2.0.0** or higher.
### Linux
In Debian, version 2 can be installed from default packages.
```bash
apt install wget m4 unzip librsvg2-bin curl bubblewrap
apt-get install opam
```
In Ubuntu, the version 2 is not available by default.
See addendum at the end of this page or consult [**opam** installation page](https://opam.ocaml.org/doc/Install.html) for installation.
The following commands installs a few other needed packages:
```bash
apt-get install wget m4 unzip librsvg2-bin curl bubblewrap
```
### Mac OS&nbsp;X
......@@ -28,37 +47,19 @@ apt install wget m4 unzip librsvg2-bin curl bubblewrap
:warning: **[Brew](https://brew.sh/)** is an alternative only if you do not plan to use the GUI (the package `webkit-gtk` required by the GUI is not available through **Brew**).
* `sudo port install aspcud`
## Step 2: Install opam
**opam** is a package manager for **Ocaml**.
**Grew** requires **opam** version **2.0.0** or higher.
### Linux
The `apt` package manager does not currently (February 2019) provide `opam` version 2.
You should be able to install version **2.0.3** with the following commands:
* `wget -q https://github.com/ocaml/opam/releases/download/2.0.3/opam-2.0.3-x86_64-linux`
* `sudo mv opam-2.0.3-x86_64-linux /usr/local/bin/opam`
* `sudo chmod a+x /usr/local/bin/opam`
For more information, please consult [**opam** installation page](https://opam.ocaml.org/doc/Install.html).
### Mac OS&nbsp;X
**MacPorts** proposes **opam** version 2 by default.
* `sudo port install opam`
## Step 3: Setup opam
## Step 2: Setup opam
Run `opam init` and follow instructions.
Note that it takes some times to download and build the `ocaml` compiler.
Run:
NB: some user have reported that the command `opam init --disable-sandboxing` may avoid errors given by `opam init`.
* `opam init` and follow instructions (answer `y` to different questions).
* `opam switch create 4.09.0 4.09.0` installation of Ocaml. Note that it takes some times to download and build the `ocaml` compiler.
* Check that `ocaml` is installed with `ocamlc -v`.
Check that `ocaml` is installed with `ocamlc -v`.
## Step 3: Install the Grew software
## Step 4: Install the Grew software
Run the commands:
```bash
opam remote add grew "http://opam.grew.fr"
......@@ -71,7 +72,7 @@ To verify your installation:
* In case of trouble, make sure that your PATH contains `~/.opam/default/bin` and try again
* If trouble persists, please [fill an issue](https://gitlab.inria.fr/grew/grew_doc/issues)
## Step 5: The Python library
## Step 4: The Python library
With Python 3, use the following command:
`pip install grew`
......@@ -84,3 +85,16 @@ Note: depending on your local installation, you may have to use `pip3` or `pip3.
* A Gtk user interface is available, see [here](../install_gtk).
* A docker file with the Python library ready to be used is available [here](../docker).
---
# Addendum
Installation of `opam` version 2 on Ubuntu:
You should be able to install version **2.0.6** with the following commands:
* `wget -q https://github.com/ocaml/opam/releases/download/2.0.6/opam-2.0.6-x86_64-linux`
* `sudo mv opam-2.0.6-x86_64-linux /usr/local/bin/opam`
* `sudo chmod a+x /usr/local/bin/opam`
For more information, please consult [**opam** installation page](https://opam.ocaml.org/doc/Install.html).
......@@ -13,7 +13,7 @@ A Pattern is defined through 3 different parts that are all optional.
* at most one positive clause introduced by keyword `pattern` which describes a positive pattern that must be found in the graph.
* any number of negative clauses introduced by the keyword `without`; each clause filters out a subpart of the matchings previously selected.
* [Since version 1.2] at most one global clause introduced by the keyword `global` which filters out a subpart of graphs.
* at most one global clause introduced by the keyword `global` which filters out a subpart of graphs.
The global matching process is:
......@@ -62,7 +62,7 @@ All edge clauses below require the existence of an edge between the node selecte
Edge may also be named for future use (in commands for instance) with an identifier:
* `e: N -> M` : no additional constrains
* `e: N -> M`
Note that edge may refer to undeclared nodes, these nodes are then implicitly declared with any constraint.
For instance, the two patterns below are equivalent:
......@@ -105,6 +105,8 @@ pattern { N1 -[ARG1]-> N; N2 -[ARG1]-> N; N3 -[ARG1]-> N; }
This pattern is found 120 times in the Little Prince corpus ([Grew-match](http://match.grew.fr/?corpus=Little_Prince&custom=5d4d6c143cfa6)) but there are only 20 different occurrences, each one is reported 6 times with all permutations on `N1`, `N2` and `N3`.
To avoid this, a constraint `id(N1) < id(N2)` can be used.
It imposes an ordering on some internal representation of the nodes and so avoid these permutations.
**NB**: if a constraint `id(N1) < id(N2)` is used with two non-equivalent nodes, the result is unspecified.
The pattern below returns the 20 expected occurrences ([Grew-match](http://match.grew.fr/?corpus=Little_Prince&custom=5d4d6bb86ce49))
......@@ -128,11 +130,11 @@ We plan to add more constraints in the near future. Please drop us a [feature re
We describe below 4 of the constraints available in version 1.2.
For each one, its negation is available by changing the `is_` prefix by the `is_not_` prefix.
* `is_cyclic`: the graph satisfied this constrain if and only if it contains a cycle.
* `is_cyclic`: the graph satisfied this constraint if and only if it contains a cycle.
A cycle is a list of nodes `N1`, `N2``N(k-1)`, `Nk` such that there are edges `N1 -> N2`, `N2 -> N3`, `N(k-1) -> Nk`, `Nk -> N1`.
In graph theory, a non cyclic graph is also called a Directed Acyclic Graph (DAG).
* `is_forest`: the graph satisfied this constrain if and only it is acyclic and if there are no couples of edges with the same target.
* `is_forest`: the graph satisfied this constraint if and only it is acyclic and if there are no couples of edges with the same target.
In other words, a graph is a forest if and only if it is acyclic and each node has at most one incoming edge.
* `is_tree`: a graph is a tree if it is a forest and if it have exactly one root.
......
......@@ -74,7 +74,8 @@ where:
The output is given in JSON format.
:warning: The output of the `grep` mode has changed in version 1.3 (June 2019)
:warning: The output of the `grep` mode has changed in version 1.3.1 (January 2020).
The new version describes both node matching and edge matching.
## Example
......@@ -86,8 +87,8 @@ With the following files:
```
pattern {
V [cat=V];
V -[a_obj]-> A;
V -[de_obj]-> DE;
e1: V -[a_obj]-> A;
e2: V -[de_obj]-> DE;
}
```
......@@ -101,33 +102,70 @@ produces the following JSON output:
[
{
"sent_id": "Europar.550_00496",
"matching": { "V": "16", "DE": "19", "A": "14" }
"matching": {
"nodes": { "V": "16", "DE": "19", "A": "14" },
"edges": {
"e2": { "source": "16", "label": "de_obj", "target": "19" },
"e1": { "source": "16", "label": "a_obj", "target": "14" }
}
}
},
{
"sent_id": "emea-fr-test_00478",
"matching": { "V": "33", "DE": "32", "A": "35" }
"matching": {
"nodes": { "V": "33", "DE": "32", "A": "35" },
"edges": {
"e2": { "source": "33", "label": "de_obj", "target": "32" },
"e1": { "source": "33", "label": "a_obj", "target": "35" }
}
}
},
{
"sent_id": "emea-fr-test_00438",
"matching": { "V": "20", "DE": "21", "A": "22" }
"matching": {
"nodes": { "V": "20", "DE": "21", "A": "22" },
"edges": {
"e2": { "source": "20", "label": "de_obj", "target": "21" },
"e1": { "source": "20", "label": "a_obj", "target": "22" }
}
}
},
{
"sent_id": "annodis.er_00441",
"matching": { "V": "16", "DE": "20", "A": "18" }
"matching": {
"nodes": { "V": "16", "DE": "20", "A": "18" },
"edges": {
"e2": { "source": "16", "label": "de_obj", "target": "20" },
"e1": { "source": "16", "label": "a_obj", "target": "18" }
}
}
},
{
"sent_id": "annodis.er_00240",
"matching": { "V": "12", "DE": "13", "A": "11" }
"matching": {
"nodes": { "V": "12", "DE": "13", "A": "11" },
"edges": {
"e2": { "source": "12", "label": "de_obj", "target": "13" },
"e1": { "source": "12", "label": "a_obj", "target": "11" }
}
}
},
{
"sent_id": "annodis.er_00040",
"matching": { "V": "42", "DE": "50", "A": "47" }
"matching": {
"nodes": { "V": "42", "DE": "50", "A": "47" },
"edges": {
"e2": { "source": "42", "label": "de_obj", "target": "50" },
"e1": { "source": "42", "label": "a_obj", "target": "47" }
}
}
}
]
```
This means that the pattern described in the file `subcat.pat` was found 6 times in the corpus, each item gives the sentence identifier and the position of nodes matched by the pattern.
This means that the pattern described in the file `subcat.pat` was found 6 times in the corpus, each item gives the sentence identifier and the position of nodes and edges matched by the pattern.
Note that two other options exist (`-html` and `-dep_dir <directory>`).
The first one produces a new `html` field in each JSON item with the sentence where words impacted by the pattern are in a special HTML span with class `highlight`.
The second one produces a new file in the folder `directory` with the representation of the sentence with highlighted part (as in [Grew-match](http://match.grew.fr) tool) and a new field in each JSON item with the filename; the output is in `dep` format (usable with [Dep2pict](http://dep2pict.loria.fr)).
Note that two other options exist:
* `-html`: produces a new `html` field in each JSON item with the sentence where words impacted by the pattern are in a special HTML span with class `highlight`
* `-dep_dir <directory>`: produces a new file in the folder `directory` with the representation of the sentence with highlighted part (as in [Grew-match](http://match.grew.fr) tool) and a new field in each JSON item with the filename; the output is in `dep` format (usable with [Dep2pict](http://dep2pict.loria.fr)).
pattern {
V [cat=V];
V -[a_obj]-> A;
V -[de_obj]-> DE;
e1: V -[a_obj]-> A;
e2: V -[de_obj]-> DE;
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment