install_match.md 5.8 KB
Newer Older
Bruno Guillaume's avatar
Bruno Guillaume committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+++
date = "2018-06-19T16:42:21+02:00"
title = "install_match"
Categories = ["Development","GoLang"]
Tags = ["Development","golang"]
Description = ""
menu = "main"

+++

# Local installation of Grew-match

**Grew-match** is available [online](http://match.grew.fr) on a set of corpora (mainly from the UD project).
If you want to use **Grew-match** on your own corpus, you have to install it locally, following the instructions on this page.

16 17 18 19 20 21 22
## STEP 0: Run a web server

A web server is required. You can install [apache](https://www.apache.org) or one of the easy to install distribution like [LAMP on Linux](https://en.wikipedia.org/wiki/LAMP_%28software_bundle%29) or [MAMP on Mac OSX](https://www.mamp.info).

In the following we will call `DOCUMENT_ROOT` the main folder accessible from your website:

 * with apache, it is defined in the `httpd.conf` file
23
 * with LAMP, it should be `/opt/lampp/htdocs/`
24 25 26 27 28 29
 * with MAMP, it should be `/Applications/MAMP/htdocs`

In doubt, refer to the documentation of the corresponding web server.

We use the port number `8888` below. You may have to change this if this port number is already used.

30
## STEP 1: Install the webpage
Bruno Guillaume's avatar
Bruno Guillaume committed
31 32

### Download
33
The code for the webpage is available through [`gitlab.inria.fr`](https://gitlab.inria.fr) with:
Bruno Guillaume's avatar
Bruno Guillaume committed
34 35 36 37 38 39

```
git clone https://gitlab.inria.fr/grew/grew_match.git
```

### Configuration
40
Move to the main folder of the project:
Bruno Guillaume's avatar
Bruno Guillaume committed
41

42 43 44 45
```
cd grew_match
```

46
Edit the file `corpora/groups.json` to describe the set of available corpora.
47 48 49
For instance with our previous example with 3 corpora, the configuration file looks like:

```json
Bruno Guillaume's avatar
Bruno Guillaume committed
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
{ "groups": [
    { "id": "local",
      "name": "Local corpora",
      "corpora": [
        { "id": "my_corpora" },
        { "folder": "Older versions",
          "corpora": [
            { "id": "my_corpora@2.0" },
            { "id": "my_corpora@1.0" }
          ]
        }
      ]
    }
  ]
}
```

Bruno Guillaume's avatar
Bruno Guillaume committed
67 68
In JSON, `groups` defines the items in the top navbar and `corpora` the list of corpora in the left bar, maybe organised in folders (recursive folders are not handled).
You can look the [configuration file](https://gitlab.inria.fr/grew/grew_match/blob/master/corpora_for_website/groups.json) used on [Grew-match](http://match.grew.fr) for a larger example.
Bruno Guillaume's avatar
Bruno Guillaume committed
69 70 71

### Install

72
The project contains a file `install_template.sh`.
Bruno Guillaume's avatar
Bruno Guillaume committed
73 74 75

```shell
# decide where you want to store the webpage locally
76
DEST=DOCUMENT_ROOT/grew_match
Bruno Guillaume's avatar
Bruno Guillaume committed
77

78 79 80 81 82 83
# set the PORT number
PORT=8888

# build the DEST directory if needed
mkdir -p $DEST

Bruno Guillaume's avatar
Bruno Guillaume committed
84
# Copy the files in the right place
85 86
cp *.php *.xml *.html *.png $DEST
cp -r corpora css fonts icon js tables tuto $DEST
Bruno Guillaume's avatar
Bruno Guillaume committed
87 88 89 90 91 92

# build local folders for storing data
cd $DEST
mkdir -p data/shorten
chmod -R 777 data

93 94 95 96 97
# build other useful folders
mkdir -p _tables
mkdir -p _logs
mkdir -p _descs

Bruno Guillaume's avatar
Bruno Guillaume committed
98
# update parameters in the code
99 100 101 102
cat ajaxGrew.php | sed "s+@PORT@+${PORT}+" | sed "s+@DATADIR@+$DEST/data/+" > __tmp_file && mv -f __tmp_file ajaxGrew.php
cat export.php | sed "s+@PORT@+${PORT}+" | sed "s+@DATADIR@+$DEST/data/+" > __tmp_file && mv -f __tmp_file export.php
cat purge.php | sed "s+@DATADIR@+$DEST/data/+" > __tmp_file && mv -f __tmp_file purge.php
cat shorten.php | sed "s+@DATADIR@+$DEST/data/+" > __tmp_file && mv -f __tmp_file shorten.php
103 104 105 106 107 108 109 110
```

 Copy it with the name `install.sh`:

```
cp install_template.sh install.sh
```

111
Edit the file `install.sh` and update `DEST` definition (line 2) and `PORT` (line 5) if needed.
112 113 114 115 116

Run the install script:

```
./install.sh
Bruno Guillaume's avatar
Bruno Guillaume committed
117 118
```

119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176
## STEP 2: Install the daemon

You have to start locally a daemon which will handle your requests on your corpora.

### Installation
Follow general instruction for [Grew installation](../install) and then install the daemon with:

`opam install grew_daemon`

### Configuration
To configure your daemon, you have to describe the corpora you want to use in a `conf` file.
This file describes each corpora with a name, a directory and a list of files.
For instance, the JSON file `my_corpora.json` below defines 3 corpora:

```json
{ "corpora": [
  { "id": "my_corpora",
    "directory": "/users/me/corpora/my_corpora",
    "files": [ "my_corpora_dev.conll", "my_corpora_test.conll", "my_corpora_train.conll" ]
  },
  { "id": "my_corpora@2.0",
    "directory": "/users/me/corpora/my_corpora/2.0",
    "files": [ "my_corpora_dev.conll", "my_corpora_test.conll", "my_corpora_train.conll" ]
  },
  { "id": "my_corpora@1.0",
    "directory": "/users/me/corpora/my_corpora/1.0",
    "files": [ "my_corpora_dev.conll", "my_corpora_test.conll", "my_corpora_train.conll" ]
  }
  ]
}
```

### Compile your corpora

In order to speed up the pattern search and to preserve memory when a large number of corpora are available, corpora are compiled with the command:

```
grew_daemon marshal my_corpora.json --webserver DOCUMENT_ROOT
```

A new file with the name of the corpus and the extension `.marshal` is created in the corpus directory.
Of course, you will have to compile again if one of your corpora is modified.
The compilation step will also build the relation tables and put them in a place where they can be found by the server.

You can clean the compiled files with:

```
grew_daemon clean my_corpora.json
```

### Run the daemon

The Daemon is started with the command (update the port number if necessary):

```
grew_daemon run --port 8888 my_corpora.json
```

Bruno Guillaume's avatar
Bruno Guillaume committed
177 178 179
## Step 3 and more

### Test
180 181
Make sure that the web server is running.
You should be able to request your corpora from [`http://localhost:8888/grew_match`](http://localhost:8888/grew_match).
Bruno Guillaume's avatar
Bruno Guillaume committed
182 183 184 185 186 187 188 189 190
Feel free to contact [us](mailto:Bruno.Guillaume@loria.fr) in case of trouble.

### Restart the daemon when one of the corpora is updated

1. Kill the running daemon (you can use the command `killall grew_daemon` if the daemon is running in the background)
2. Run the compile operation again: `grew_daemon marshal my_corpora.json`
3. Restart the daemon: `grew_daemon run --port 8888 my_corpora.json`