Commit 58ddcbbd authored by Mathieu Giraud's avatar Mathieu Giraud

Merge branch 'doc/4150-validate-most-relative-links' into 'dev'

doc: validate (most) relative links and fix some broken links

Closes #4150

See merge request !703
parents 8b6d7c24 b1ef63a8
Pipeline #150302 passed with stages
in 7 minutes and 41 seconds
......@@ -6,7 +6,7 @@ build_doc:
- pip3 install mkdocs-gitlab-plugin requests
script:
- make -C doc html
- find site -name "*.html" -type f | xargs python tools/validate-links.py
- cd tools ; python validate-links.py
artifacts:
paths:
- site/
......
......@@ -13,14 +13,19 @@ enables the deep sequencing of a lymphoid population with dedicated [Rep-Seq](ht
methods and software.
### Life scientist
- Tutorial "Mastering the Vidjil web application": [english](./tutorial/mastering-vidjil.html) ([pdf](./tutorial/mastering-vidjil.pdf)), [français](./tutorial/mastering-vidjil-fr.html) ([pdf](./tutorial/mastering-vidjil-fr.pdf)) 🔗. Start by this tutorial to have an overview of Vidjil.
- Tutorial "Mastering the Vidjil web application":
[english](http://www.vidjil.org/doc/tutorial/mastering-vidjil.html)
([pdf](http://www.vidjil.org/doc/tutorial/mastering-vidjil.pdf)),
[français](http://www.vidjil.org/doc/tutorial/mastering-vidjil-fr.html)
([pdf](http://www.vidjil.org/doc/tutorial/mastering-vidjil-fr.pdf)) 🔗.
Start by this tutorial to have an overview of Vidjil.
- Web platform [user manual](user.md). This is the main user manual of the Vidjil platform.
- [Libraries and recombinations](locus.md), documentation on library preparation and sequencing as well on detected immune recombinations
- [Demo access](http://app.vidjil.org/) 🔗 to the patient, experiment and sample public test server
### Bioinformatician
- [Vidjil-algo documentation](vidjil-algo.md), usage from the command-line
- [fuse.py](tools.py), converting and merging immune repertoire data
- [fuse.py](tools.md), converting and merging immune repertoire data
- Specification of the [.vidjil format](vidjil-format.md) to encode immune repertoires with clones with V(D)J recombinations
- Specification of the [warnings](warnings.md), list of default [tags](tags.org)
- Specification of the [.should-vdj.fa tests](should-vdj.md) for encoding and testing curated V(D)J designations
......@@ -31,7 +36,7 @@ methods and software.
### Quality, open data, roadmap, credits
- [Software and developement quality](quality.md), including software engineering methods and human and team processes
<!-- - [Roadmap](roadmap.md) -->
- Bioinformatics, technical, and administrative [Roadmap](roadmap.md)
- [Public datasets](http://www.vidjil.org/data/) 🔗 supporting Vidjil publications
- [Credits, references](credits.md)
......
......@@ -11,7 +11,7 @@ of any software doing immune repertoire sequencing (RepSeq) analysis.
## Contributing to the tests
Users and developers of RepSeq software are encouraged to [send us](contact@vidjil.org)
Users and developers of RepSeq software are encouraged to [send us](mailto:contact@vidjil.org)
their manually curated sequences, ideally in the format described below, or by
directly proposing pull requests on Gitlab with new tests in the [`algo/tests/should-vdj`](https://gitlab.inria.fr/vidjil/vidjil/tree/master/algo/tests/should-vdj-tests) directory.
We can also help to encode sequences in this format.
......
......@@ -30,7 +30,7 @@ The `mrd.vidjil` file can then be fed to the web client.
The AIRR community has published [a standard representation](http://docs.airr-community.org/en/latest/datarep/overview.html#format-specification) to describe results of immune receptor repertoire analysis.
Used by an increasing number of software, this `.tsv` format allows to easily transfer immune repertoire data between pipelines.
The [AIRR output of vidjil-algo](./vidjil-algo/#airr-tsv-output) enables to feed vidjil-algo output to other software.
The [AIRR output of vidjil-algo](vidjil-algo/#airr-tsv-output) enables to feed vidjil-algo output to other software.
Conversely, `fuse.py` is able to take one or several AIRR `.tsv` file(s) to get a `.vidjil` file that can be opened by the Vidjil web application:
``` bash
......
......@@ -85,7 +85,7 @@ to learn the essential features of Vidjil.
- *patient/run/set information.*
- *locus.* Germline(s) used for analyzing the data. In case of multi-locus
data, you can select what locus should be displayed (see [locus.html](./locus.html))
data, you can select what locus should be displayed (see [Libraries and recombinations](locus.md))
- *analysis.* Name (without extension) of the loaded file.
- *sample.* Name of the current sample.
......@@ -334,7 +334,7 @@ The processing can take a few seconds to a few hours, depending on the
software lauched, the options set in the config, the size of the sample and the server load.
The base human configurations with **vidjil-algo** are « TRG », « IGH », « multi » (`-g germline`), « multi+inc » (`-g germline -i`), « multi+inc+xxx » (`-g germline -i -2`, default advised configuration).
See [locus.html](./locus.html) for information on these configurations.
See [Libraries and recombinations](locus.md) for information on these configurations.
There are also configuration for other species and for other RepSeq algorithms, such as « MiXCR ».
The server mainteners can add new configurations tailored to specific needs, contact us if you have other needs.
......
......@@ -274,7 +274,7 @@ The `germline/*.g` presets configure the analyzed recombinations.
The following presets are provided:
- `germline/homo-sapiens.g`: Homo sapiens, TR (`TRA`, `TRB`, `TRG`, `TRD`) and Ig (`IGH`, `IGK`, `IGL`) locus,
including incomplete/unusal recombinations (`TRA+D`, `TRB+`, `TRD+`, `IGH+`, `IGK+`, see [locus](locus)).
including incomplete/unusal recombinations (`TRA+D`, `TRB+`, `TRD+`, `IGH+`, `IGK+`, see <locus.md>.
- `germline/homo-sapiens-isotypes.g`: Homo sapiens heavy chain locus, looking for sequences with, on one side, IGHJ (or even IGHV) genes,
and, on the other side, an IGH constant chain.
- `germline/homo-sapiens-cd.g`: Homo sapiens, common CD genes (experimental, does not check for recombinations)
......@@ -418,7 +418,7 @@ for the IGH locus. However, they
are not at the core of the Vidjil clone clustering method (which
relies only on the 'window', see above).
To check the quality of these designations, the automated test suite include
sequences with manually curated V(D)J designations (see [should-vdj.md](should-vdj)).
[sequences with manually curated V(D)J designations](should-vdj.md).
If you want to analyze more clones, you should use `--max-designations 200` or
`--max-designations 500`. It is not recommended to use larger values: outputting more
......@@ -699,17 +699,17 @@ For example `-uu -X 1000` splits the not detected reads from the 1000 first read
Since version 2018.10, vidjil-algo supports the [AIRR format](http://docs.airr-community.org/en/latest/datarep/rearrangements.html#fields).
We export all required fields, some optional fields, as also some custom fields (+).
We also propose in [fuse.py](/tools) a way to convert AIRR format to the `.vidjil` format.
We also propose in [fuse.py](tools.md) a way to convert AIRR format to the `.vidjil` format.
Note that Vidjil-algo is designed to efficiently gather reads from large datasets into clones.
By default (`-c clones`), we thus report in the AIRR format *clones*.
See also [What is a clone ?](/vidjil-format/#what-is-a-clone).
See also [What is a clone ?](vidjil-format/#what-is-a-clone).
Using `-c designations` trigger a separate analysis for each read, but this is usually not advised for large datasets.
| Name | Type | AIRR 1.2 Description <br /> *vidjil-algo implementation* |
| ----- | ---- | ------------------------------------------------------- |
| locus | string | Gene locus (chain type). For example, `IGH`, `IGK`, `IGL`, `TRA`, `TRB`, `TRD`, or `TRG`.<br />*Vidjil-algo outputs all these loci. Moreover, the incomplete recombinations analyzed by vidjil-algo are reported as `IGH+`, `IGK+`, `TRA+D`, `TRB+`, `TRD+`, and `xxx` for unexpected recombinations. See [locus](locus).*
| locus | string | Gene locus (chain type). For example, `IGH`, `IGK`, `IGL`, `TRA`, `TRB`, `TRD`, or `TRG`.<br />*Vidjil-algo outputs all these loci. Moreover, the incomplete recombinations analyzed by vidjil-algo are reported as `IGH+`, `IGK+`, `TRA+D`, `TRB+`, `TRD+`, and `xxx` for unexpected recombinations. See <locus.md>.*
| duplicate_count | number | Number of reads contributing to the (UMI) consensus for this sequence. For example, the sum of the number of reads for all UMIs that contribute to the query sequence. <br />*Number of reads gathered in the clone.*
| sequence_id | string | Unique query sequence identifier within the file. Most often this will be the input sequence header or a substring thereof, but may also be a custom identifier defined by the tool in cases where query sequences have been combined in some fashion prior to alignment. <br />*This identifier is the (50 bp by default) window extacted around the junction.* |
| clone_id | string | Clonal cluster assignment for the query sequence. <br />*This identifier is again the (50 bp by default) window extacted around the junction.*
......
......@@ -474,7 +474,7 @@ against them (not implemented now).
## Tagging some clones: `tags` list \[optional\]
The `tags` list describe the custom tag names as well as tags that should be hidden by default.
The default tag names are defined in [../browser/js/vidjil-style.js](../browser/js/vidjil-style.js).
The default tag names are defined in [../browser/js/vidjil-style.js](http://gitlab.vidjil.org/-/blob/master/browser/js/vidjil-style.js).
``` javascript
"key" : "value" // "key" is the tag id from 0 to 7 and "value" is the custom tag name attributed
......
......@@ -22,11 +22,11 @@ nav:
- Specification of the .vidjil format: vidjil-format.md
- Specification of the warnings: warnings.md
- Specification of the .should-vdj.fa tests: should-vdj.md
- Further developer documentation: /#further-developer-documentation
- Further developer documentation: http://www.vidjil.org/doc/#further-developer-documentation
- Server administrator:
- Server administration (web): admin.md
- Server installation and maintenance (docker): server.md
- Further developer documentation: /#further-developer-documentation
- Further developer documentation: http://www.vidjil.org/doc/#further-developer-documentation
- Quality, roadmap, credits:
- Software and development quality: quality.md
- Public datasets supporting Vidjil publications 🔗: http://www.vidjil.org/data
......
......@@ -3,6 +3,7 @@
import requests
import glob
import sys
import os.path
try:
from urllib.parse import *
except:
......@@ -10,7 +11,11 @@ except:
import re
from collections import defaultdict
DEFAULT_FILES = glob.glob('../site/*/*.html')
ALL_FILES = glob.glob('../site/**/*.html', recursive=True)
IGNORED_FILES = glob.glob('../site/dev-*/*.html') + ['../site/404.html', '../site/tips.html']
DEFAULT_FILES = set(ALL_FILES) - set(IGNORED_FILES)
BASE_PATH = '../site/'
REGEX_HREF = re.compile('href="(.*?)"')
REGEX_ID = re.compile('id="(.*?)"')
......@@ -25,17 +30,26 @@ USER_AGENT = {'User-Agent': 'Mozilla/5.0'}
stats = defaultdict(int)
failed = []
not_checked = []
def check_url(url, ids=[]):
def check_url(url, ids=[], dirname=''):
# Internal links
if url.startswith('#'):
return (not url[1:]) or (url[1:] in ids)
# Relative links: TODO
if not url.startswith('http'):
# Mailto
if url.startswith('mailto'):
return None
# Relative links
if not url.startswith('http'):
# Anchors in relative links - TODO
if '#' in url:
return None
ff = os.path.join(BASE_PATH if url.startswith('/') else dirname, url)
return os.path.exists(ff)
# External http(s) links
try:
req = requests.get(url, headers = USER_AGENT)
......@@ -46,16 +60,21 @@ def check_url(url, ids=[]):
def check_file(f):
print('<-- ', f)
dirname = os.path.dirname(f)
content = ''.join(open(f).readlines())
ids = REGEX_ID.findall(content)
for url in REGEX_HREF.findall(content):
ok = check_url(url, ids)
ok = check_url(url, ids, dirname)
print(STATUS[ok] + ' ' + url)
globals()['stats'][ok] += 1
msg = "%s: %s" % (f.replace(BASE_PATH,''), url)
if ok == False:
failed.append(url)
failed.append(msg)
if ok == None:
not_checked.append(msg)
print()
......@@ -64,6 +83,11 @@ def print_stats():
for k, v in STATUS.items():
print(' %s : %3d' % (v, globals()['stats'][k]))
if globals()['stats'][None]:
print('==== Not checked')
for f in not_checked:
print(' ' + f)
if globals()['stats'][False]:
print('==== Failed')
for f in failed:
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment