Commit 06d5ef10 authored by Mathieu Giraud's avatar Mathieu Giraud

doc/workflow.md: details

parent ad8c3635
Pipeline #217454 passed with stages
in 2 minutes and 31 seconds
......@@ -38,31 +38,34 @@ Work is underway to release this version for production.
- Requirements ([more documentation](vidjil-algo.md#installation)): on a recent Ubuntu system, `sudo apt-get install zlib1g-dev`
- Download and extract <http://www.vidjil.org/releases/vidjil-algo-latest.tar.gz> or <http://www.vidjil.org/releases/vidjil-algo-alpha.tar.gz>
- Inside `vidjil-algo` directory, build it with `make` (it boths compile vijdil-algo and fetches germlines genes repertoires created from IMGT and NCBI)
- Inside `vidjil-algo-...` directory, build it with `make`
(it boths compile `vijdil-algo` and fetches the `germline/` directory, with germlines genes repertoires created from IMGT and NCBI)
**Install `flash2`**
- Download and extract <https://github.com/dstreett/FLASH2/archive/master.zip>
- Inside `flash2` directory, build it with `make`
- Inside `FLASH2-master` directory, build it with `make`
You may copy `vidjil-algo` and `flash2` binaries to folders avaialble from your `$PATH`.
You may copy `vidjil-algo` and `flash2` binaries to folders available from your `$PATH`.
### Usage
flash2 outputs several files: merged reads, unmerged reads from R1 file, unmerged reads from R2, and histogram.
`flash2` outputs several files: merged reads, unmerged reads from R1 file, unmerged reads from R2, and histogram.
You can concatenate merged reads and one of the unmerged files
to keep the same number of reads that in the inital fastq file
(as the [pre-processing](user.md#pre-processing) on the Vidjil server).
The following command line thus keeps `out.notCombined_1`, from R1,
supposing that R1 reads are "more centered" on the V(D)J junction than R2 reads.
Starting from `R1.fastq` and `R2.fastq`:
Starting from `R1.fastq` and `R2.fastq` (`flash2` only works with `.fastq` files):
- Merge: `flash2 R1.fastq R2.fastq -m 300 -t 4 -z` (`-t 4` : run on 4 threads)
- Concatenate the files you want to keep, as for example `cat out.extendedFrags.fastq out.notCombined_1.fastq.gz > merged-reads.fastq.gz`
- Filter: `vidjil-algo --filter-reads --gz -g germline/homo-sapiens.g merged-reads.fastq.gz`
The resulting `merged-reads.filtered.fa.gz` file can be uploaded on any Vidjil server,
or re-analyzed with vidjil-algo or with other software.
(`germline/` is the path to the gene repertoires directory, in `vidjil-algo-.../`)
The resulting `merged-reads.detected.vdj.fa.gz` file can be uploaded on any Vidjil server,
or re-analyzed with `vidjil-algo` or with other software.
Once the filtering has begun, interrupting `vidjil-algo` with `Ctrl-C` (`SIGINT`) gracefully stops execution while still producing (partial) files.
It can be used to check how the filtering works before a full run.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment