Commit a5694371 authored by Mikaël Salson's avatar Mikaël Salson

algo/tests: Test the --consensus-on-random-sample option

The dataset was generated to have a minority of sequences with a TTT insertion
which would make it the sequence of choice with the default ReadScore (based
on length and quality).

On the contrary with a random sampler we should not be impacted by this
insertion as it happens in the minority of sequences.

Here is the command that generated the dataset:
for i in $(seq 1 3500); do
  if [ $((RANDOM%3)) -ne 0 ]; then
    echo ">seq$i";
    echo ctacctactactgtgccttgtgggaggtgatagtagtgattggatcaag;
    echo ">seq$i";
    echo cTTTtacctactactgtgccttgtgggaggtgatagtagtgattggatcaag;
done > test-random-consensus.fa
parent 40e75d6e
Pipeline #66474 passed with stages
in 40 minutes and 15 seconds
!LAUNCH: $VIDJIL_DIR/$EXEC $VIDJIL_DEFAULT_OPTIONS -w 20 -g $VIDJIL_DIR/germline/homo-sapiens.g:TRG $VIDJIL_DATA/test-random-consensus.fa.gz > consensus-default.log
!LAUNCH: $VIDJIL_DIR/$EXEC $VIDJIL_DEFAULT_OPTIONS -w 20 -g $VIDJIL_DIR/germline/homo-sapiens.g:TRG --consensus-on-random-sample $VIDJIL_DATA/test-random-consensus.fa.gz > consensus-random.log
!LAUNCH: diff consensus-default.log consensus-random.log
$ Output should differ: default has a consensus of 52bp (with the spurious insertion)
# Appears twice in the header of the consensus sequence and in the similarity matrix
2:^< .* 52 bp
1:^< CTTTT
$ With random read sample the consensus should not have the spurious insertion (49 bp)
2:^> .* 49 bp
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment