diff --git a/docs/data_preprocessing.rst b/docs/data_preprocessing.rst
index 565fcced1996449808b9083fe93b9633146632ec..9b4fd953c234c65c7b7d4ccf64a67a9e3f3e6a0e 100644
--- a/docs/data_preprocessing.rst
+++ b/docs/data_preprocessing.rst
@@ -227,11 +227,11 @@ as well as the version of ``dnadna`` used.
 Command line
 ============
 
-Once the preprocessing configuration file has been filled and the required input files are created, the command to start the preprocessing is simply: 
+Once the preprocessing configuration file has been filled and the required input files are created, the command to start the preprocessing is simply:
 
 .. code-block:: bash
 
-    dnadna preprocess preprocess_config_file.yml
+    dnadna preprocess <model_name>_preprocessing_config.yml
 
 
 More details can be found in the :ref:`introduction:Quickstart Tutorial`.
diff --git a/docs/datasets.rst b/docs/datasets.rst
index 8669bec4d4f94960100496c88c85d31d063fac4d..a6f6eba675557a8fbe340f98aff80d0c0b3be281 100644
--- a/docs/datasets.rst
+++ b/docs/datasets.rst
@@ -205,6 +205,33 @@ DNADNA format can be changed to suit your wishes, e.g. you could change to::
 
   filename = f"{dataset_name}/scen_{scenario}_arbitrary_text/rep_{replicate}/{scenario}_{replicate}.npz"
 
+
+In which case you will update ``filename_format`` in the
+:ref:`dataset config file <dnadna-dataset-simulation-config>`:
+
+.. code-block:: YAML
+
+  data_source:
+    # string template for per-replicate simulation files in Python
+    # string template format; the following template variables may be
+    # used: 'name', the same as the name property used in this config
+    # file; 'scenario', the scenario number, and 'replicate', the
+    # replicate number of the scenario (if there are multiple
+    # replicates); path separators may also be used in the template to
+    # form a directory structure
+    filename_format: {dataset_name}/scen_{scenario}_arbitrary_text/rep_{replicate}/{scenario}_{replicate}.npz
+
+
+before running:
+
+.. code-block:: bash
+
+  $ dnadna init --dataset-config={dataset_name}/{dataset_name}_dataset_config.yml
+
+where ``{dataset_name}/{dataset_name}_dataset_config.yml`` is the name you
+picked for the config file.
+
+
 You can check our `notebook
 <https://gitlab.com/mlgenetics/dnadna/-/tree/master/examples/example_simulate_msprime_save_dnadna_npz.ipynb>`_
 for an illustration of a simple constant demographic scenario in ``msprime``
diff --git a/docs/overview.rst b/docs/overview.rst
index fb472023248f58383400e723d8d6ea2a6ff8ddb8..78ee14f1ec51d31cea9671b53dc2fd75ab57be00 100644
--- a/docs/overview.rst
+++ b/docs/overview.rst
@@ -140,20 +140,126 @@ would output config files to ``/mnt/nfs/username/models/my_model/``.
 Preprocessing
 =============
 
-TODO
+The preprocessing step performs the following:
 
+* validating input files and filtering out scenarios that do not match minimal requirements (defined by users)
+* splitting the dataset into training/validation/test sets (the latter is optional)
+* applying transformations to target parameter(s) if required by users (e.g. log transformation)
+* standardizing target parameter(s) for regression tasks (the mean and standard deviation used in standardization are computed based on the training set only).
+
+Preprocessing is necessary before performing the first training run and should
+be re-run if and only if one of the following is true:
+
+* the dataset changed,
+
+* the task changed (e.g. predicting other parameters or the same parameters but with different transformations),
+
+* the required input dimensions changed (e.g. to match the dimensions expected by some networks).
+
+At this stage we expect the user to open ``my_model_preprocessing_config.yml``
+and edit the properties to match the task/network needs in terms of minimal
+number of SNPs and individuals required for a dataset to be valid, names of the
+evolutionary parameters to be targeted, split proportions, etc. More details
+are provided in the :doc:`dedicated preprocessing page <data_preprocessing>`.
+
+Once the preprocessing configuration file has been filled and the required input
+files are created, run preprocessing with:
+
+.. code-block:: bash
+
+  $ dnadna preprocess my_model_preprocessing_config.yml
+
+
+which outputs ``my_model/my_model_training_config.yml``,
+``my_model/my_model_preprocessed_params.csv`` and
+``my_model/my_model_preprocessing.log``.
+
+The latter is simply a log file. ``my_model_preprocessed_params.csv`` is a
+parameter table similar to ``my_model_params.csv`` but with log-transformed (if
+required) and standardized target parameters, and with an additional column
+indicating the assignment of each scenario to training, validation or test sets.
+Note that all replicates of a scenario are assigned to the same class.
+``my_model/my_model_training_config.yml`` will be described in the next section.
+
+More details on the dedicated :doc:`preprocessing page
+<data_preprocessing>`.
 
 .. _overview-training:
 
 Training
 ========
 
-TODO
+We can now proceed to training. It consists of optimizing the parameters of a
+statistical model (here the weights of a network) based on a training dataset
+and optimization hyperparameters, and evaluating the performance on a validation
+set.
+
+First edit ``my_model/my_model_training_config.yml`` to define, in
+particular, which network should be trained, its hyperparameters and loss
+function, the optimization hyperparameters, transformation for data
+augmentation, etc. More details on the dedicated :doc:`training page
+<training>`.
+
+Then run:
+
+.. code-block:: bash
+
+    $ dnadna train my_model_name_training_config.yml
+
+which creates a subdirectory ``run_{run_id}/`` containing the optimized network
+``my_model_run_{run_id}_best_net.pth`` as well as checkpoints during training, a
+log file and loss values stored in a tensorboard directory.
+
+``dnadna train`` takes additional arguments such as:
+
+* ``--plugin PLUGIN`` to pass plugin files that define custom networks,
+  optimizers or transformation that we would like to use for training
+  despite them not being in the original dnadna code. See :doc:`dedicated
+  plugin page<extending>`.
 
+* ``-r RUN_ID`` or ``--run-id RUN_ID`` to specify a run identifier different from the one created by default (the default starts at run_000 and then monotonically increases to run_001 etc.). RUN_ID can also be specified in the config file.
+
+* ``--overwrite`` to overwrite the previous run (otherwise, create a new run directory).
+
+
+More details on the dedicated :doc:`training page <training>`.
 
 .. _overview-prediction:
 
 Prediction
 ==========
 
-TODO
+Once trained, a network can be applied to a dataset in :doc:`DNADNA dataset format <datasets>` to classify/predict its evolutionary parameters. The following command is used:
+
+.. code-block:: bash
+
+    $ dnadna predict run_{run_id}/my_model_run_{run_id}_best_net.pth realdata/dataset.npz
+
+
+
+This will use the best net, but you can use any net name, such as ``run_{run_id}/my_model_run_{run_id}_last_epoch_net.pth``.
+
+This outputs the predictions in CSV format which is printed to standard out
+by default while the process runs.  You can pipe this to a file using
+standard shell redirection operators like ``dnadna predict {args} >
+predictions.csv``, or you can specify a file to output to using the
+``--output`` option.
+
+
+You can also apply ``dnadna predict`` to multiple npz files as follows:
+
+.. code-block:: bash
+
+  $ dnadna predict run_{run_id}/my_model_run_{run_id}_best_net.pth {extra_dir_name}/scenario*/*.npz
+
+where ``{extra_dir_name}`` is a directory (that you created) containing
+independent simulations which will serve as test for all networks or as
+illustration of predictive performance under specific conditions.
+
+
+Importantly if you want to ensure that target examples comply to the
+preprocessing constraints (such as the minimal number of SNPs and individuals)
+use ``--preprocess``. In that case, a warning will be displayed for each rejected scenario, with the reason of rejection (such as the minimal number of SNPs).
+
+
+More details on the dedicated :doc:`prediction page <prediction>`.
diff --git a/docs/prediction.rst b/docs/prediction.rst
index 19d5c9a894d067a1d8bad8657919f96cbc139719..88d8a4962a1b054335e598a234c9f9f220af2cec 100644
--- a/docs/prediction.rst
+++ b/docs/prediction.rst
@@ -2,3 +2,83 @@
 
 Prediction
 ##########
+
+
+Once trained a network can be applied (through a simple forward pass) to other
+datasets, such as:
+
+* a test set, after hyperparameter optimization has been done for all networks. It enables to compare fairly multiple networks and check whether they overfitted the validation set,
+
+* specific examples, to evaluate predictive performance on specific scenarios or the robustness under specific conditions (such as new data under selection while selection was absent from the training set),
+
+* real datasets to reconstruct the past evolutionary history of real populations.
+
+
+The required arguments for ``dnadna predict`` are:
+
+* MODEL: most commonly a path to a .pth file, such as
+  ``run_{runid}/my_model_run_{runid}_best_net.pth``, that contains the
+  trained network we wish to use and additional information (such as data
+  transformation that should be applied beforehand and info to unstandardize
+  and/or "untransform" the predicted parameters).  Alternatively the final
+  config file of a run ``run_{runid}/my_model_run_{runid}_final_config.yml``
+  can be passed (in which case the best network of the given run is used by
+  default).
+
+* INPUT: path to one or more npz files, or to a :ref:`dataset config file <dnadna-dataset-simulation-config>` (describing a whole dataset).
+
+
+A typical usage will thus be:
+
+.. code-block:: bash
+
+    $ dnadna predict run_{run_id}/my_model_run_{run_id}_best_net.pth realdata/sample.npz
+
+to classify/predict evolutionary parameters for a single data sample
+``realdata/sample.npz`` in :doc:`DNADNA dataset format <datasets>`.
+
+This will use the best net, but you can use any net name, such as ``run_{run_id}/my_model_run_{run_id}_last_epoch_net.pth``.
+
+This outputs the predictions in CSV format which is printed to standard out
+by default while the process runs.  You can pipe this to a file using
+standard shell redirection operators like ``dnadna predict {args} >
+predictions.csv``, or you can specify a file to output to using the
+``--output`` option.
+
+
+You can also apply dnadna predict to multiple npz files as follows:
+
+.. code-block:: bash
+
+  $ dnadna predict run_{run_id}/my_model_run_{run_id}_best_net.pth {extra_dir_name}/scenario*/*.npz
+
+where ``{extra_dir_name}`` is a directory (that you created) containing
+independent simulations which will serve as test for all networks or as
+illustration of predictive performance under specific conditions.
+
+
+The previous command is equivalent to:
+
+.. code-block:: bash
+
+    $ dnadna predict run_{run_id}/my_model_run_{run_id}_final_config.yml {extra_dir_name}/scenario*/*.npz
+
+where the training config file is passed rather than the ``.pth`` of the best
+network, but you could alternatively add the option ``--checkpoint last_epoch``
+to use the network at final stage of training rather than the best one.
+
+
+Importantly if you want to ensure that target examples comply to the
+preprocessing constraints (such as the minimal number of SNPs and individuals)
+use ``--preprocess``. In that case, a warning will be displayed for each rejected scenario, with the reason of rejection (such as the minimal number of SNPs).
+
+In the current version the same data transformations are applied to the
+training/validation/test sets and to extra simulations or real data on which
+the prediction is made. These are the same data transformations that are
+defined in the training config file for the training run that produced the
+model.
+
+Finally you can fine-tune resource usage with the options ``--gpus --GPUS`` and
+``--loader-num-workers LOADER_NUM_WORKERS`` to indicate the specific GPUs and
+the number of CPUs to use. You can display a progress bar with the option
+``--progress-bar``.
diff --git a/docs/spelling_wordlist.txt b/docs/spelling_wordlist.txt
index 3b6a3472345b432e63ba35bd27af8610c3434c7c..46f6f9aa86b23e7ae470725fb56df2e0124f9a60 100644
--- a/docs/spelling_wordlist.txt
+++ b/docs/spelling_wordlist.txt
@@ -98,7 +98,9 @@ normalizations
 npz
 nSl
 numpydoc
+optimizers
 overfit
+overfitted
 overfitting
 overline
 parallelization
@@ -150,6 +152,7 @@ uncategorized
 unnormalized
 unregister
 unstandardize
+untransform
 utils
 validator
 validators