README.md 1.62 KB
Newer Older
1
# The plantnet dataset
GARCIN Camille's avatar
GARCIN Camille committed
2

camille garcin's avatar
camille garcin committed
3 4
This dataset is a subsampling of the full dataset set used to train the models of the Pl@ntNet application (https://plantnet.org/). It comprises 1081 species of plants,
representing a total of 306293 images, split in a train, validation and test set with proportion 80/10/10% repesctively. It was created by randomly sampling the full Pl@ntNet dataset at the genus level (i.e. the level uppon the species level in the taxonomy). This allows preserving two essential properties of the full Pl@ntNet dataset: (i) the heavily tailed imbalanced distribution of the classes and (ii), the strong ambiguity existing between some species of the same genus. This dataset is aimed at facilitating research on these two fundamental problems occurring jointly (long tail distribution and class ambiguity)
5 6 7 8 9 10 11 12 13 14

## Installation

First clone the project. For the download you just need tqdm, matplotlib and requests. If you have conda you can run :

```bash
conda env create -f plantnet_env.yml
conda activate plantnet_env
```

15 16
## Species names

camille garcin's avatar
camille garcin committed
17 18 19
The names of the classes can be found in class_names.csv.
The name comprises two parts : the genus, and the specific epithet.
For instance, for the class "Chaerophyllum_bulbosum", the genus is "Chaerophyllum" and the epithet is "Bulbosum"
20

21 22 23 24
## Downloading the dataset

To donwload the dataset, run :
```bash
camille garcin's avatar
camille garcin committed
25
python dl_plantnet.py --root=your_path --max_workers=4
26 27
```

camille garcin's avatar
camille garcin committed
28
where your_path is the path where you want to save the dataset and max_workers represents the number of threads to use to donwload the dataset.
29 30 31 32 33 34 35

If for some reason your download is interrupted, you can run : 
```bash
python resume_dl.py
```