Vulnerability database generator
The Vulnerability database generator produces vulnerable and fixed synthetic samples expressing web vulnerability flaws.
This repository is the official implementation of this approach described in:
Comparing the Detection of {XSS} Vulnerabilities in Node.js and a Multi-tier JavaScript-based Language via Deep Learning, Héloïse Maurel, Santiago Vidal and Tamara Rezk, In Proceedings of ICISSP 2021 PDF(https://hal.inria.fr/hal-03273564)
_February 2022 - The paper was accepted to ICISSP 2021
Citation
Comparing the Detection of {XSS} Vulnerabilities in Node.js and a Multi-tier JavaScript-based Language via Deep Learning, Héloïse Maurel, Santiago Vidal and Tamara Rezk, In Proceedings of ICISSP 2021
-
Online at hal.inria : PDF
-
Citation in .bibTex format :
@inproceedings{DBLP:conf/icissp/MaurelVR22, author = {H{\'{e}}lo{\'{\i}}se Maurel and Santiago A. Vidal and Tamara Rezk}, editor = {Paolo Mori and Gabriele Lenzini and Steven Furnell}, title = {Comparing the Detection of {XSS} Vulnerabilities in Node.js and a Multi-tier JavaScript-based Language via Deep Learning}, booktitle = {Proceedings of the 8th International Conference on Information Systems Security and Privacy, {ICISSP} 2022, Online Streaming, February 9-11, 2022}, pages = {189--201}, publisher = {{SCITEPRESS}}, year = {2022}, url = {https://doi.org/10.5220/0010980800003120}, doi = {10.5220/0010980800003120}, timestamp = {Wed, 16 Mar 2022 11:05:48 +0100}, biburl = {https://dblp.org/rec/conf/icissp/MaurelVR22.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
Overview
One of the essential steps to apply any supervised deep learning algorithm is to design a reliable and comprehensive dataset. In our case, the server-side code cannot be obtained by browsing the web, and it is difficult to reliably and automatically classify the server-side code on public repositories, like XSS-safe or unsafe. Thus, we explore the use of a synthetic generator of vulnerabilities for XSS flaws.
Hop.js - a multi-tier JavaScript-based Language
- This generator can build sample for Hop.js version 3.5.0 (January 2022)
- To have more information about this language -- See the official website hop.inria.fr
Prerequisites
- Linux (developed on Ubuntu and Fedora)
- Python >= 3.3.3 (developed on 3.9)
A Python installation is needed to run the generator.
Supported languages
This project currently supports PHP and Node.js (with HTML and javascript) as language input.
Quickstart
Step 0: Cloning this repository
git clone https://gitlab.inria.fr/deep-learning-applied-on-web-and-iot-security/statically-identifying-xss-using-deep-learning/concatenation-detector
cd vulnerability-generator-database/
Step 1: Creating a new database from PHP or Node.js generator
To have a dataset to train a neural network on, you can use this extended generator.
All the database generations will be build on a root folder called classified-dbs
.
Generate Hop.js database
Those commands will generate XSS vulnerable and non-vulnerable Node.js sample files in a directory called NODEJS-Database_MM-DD-YYYY_HHhMMmSS
inside the classified-dbs
root folder.
python GeneratorLauncher.py --flaw=XSS --language=hopjs
python GeneratorLauncher.py -f=XSS --l=hopjs
Generate PHP database
Those commands will generate XSS vulnerable and non-vulnerable PHP sample files in a directory called PHP-Database_MM-DD-YYYY_HHhMMmSS
inside the classified-dbs
root folder.
python GeneratorLauncher.py --flaw=XSS --language=php
python GeneratorLauncher.py -f=XSS --l=php
To construct the initial distribution :
python GeneratorLauncher.py -l php -f XSS
To construct the mismatching distribution with only rule 3 & 4 :
python GeneratorLauncher.py -l php -n True -f XSS
To construct the mismatching distribution with only rule 0,1,2&5 :
python GeneratorLauncher.py -l php -m True -f XSS
Generate Node.js database
Those commands will generate XSS vulnerable and non-vulnerable Node.js sample files in a directory called NODEJS-Database_MM-DD-YYYY_HHhMMmSS
inside the classified-dbs
root folder.
python GeneratorLauncher.py --flaw=XSS --language=nodejs
python GeneratorLauncher.py -f=XSS --l=nodejs
To construct the initial distribution associated to php :
python GeneratorLauncher.py -l nodejs -v 1 -f XSS
To construct the mismatching distribution with only rule 3 & 4 related to php:
python GeneratorLauncher.py -l nodejs -v 1 -n True -f XSS
To construct the mismatching distribution with only rule 0,1,2&5 related to php:
python GeneratorLauncher.py -l nodejs -v 1 -m True -f XSS
Generator usage Examples
Note: If you don't specify the
--language
or-l
flag, the database will be generated in PHP language by default.
For Hop.js generator
python GeneratorLauncher.py -l hopjs -f XSS
python GeneratorLauncher.py -l hopjs -c 79
For Node.js generator
python GeneratorLauncher.py -l nodejs -f XSS
python GeneratorLauncher.py -l nodejs -c 79
For PHP generator
- Show command-line flags available
python GeneratorLauncher.py -h
- Generate specific type of flaws
python GeneratorLauncher.py -l php -f XSS,Injection
python GeneratorLauncher.py -l php --flaw=XSS,IDOR
- Generate specific type of CWE
python GeneratorLauncher.py -l php -c 79
python GeneratorLauncher.py -l php --cwe=78,89,90,91
Available Generation
Note: We fixed 95 XSS classification errors by correcting, adding, and combining predicate attributes describing each sanitization template according to the OWASP rules recommendations to sanitize the HTML templates safely. We also extended the PHP generator with 25 sink templates, 16 XSS inputs, and 58 different proper/improper sanitizations.
PHP - Available Generation
CWEs (-c
or --cwe
option)
-
78
: Command OS Injection -
79
: XSS -
89
: SQL Injection -
90
: LDAP Injection -
91
: XPath Injection -
95
: Code Injection -
98
: File Injection -
209
: Information Exposure Through an Error Message -
311
: Missing Encryption of Sensitive Data -
327
: Use of a Broken or Risky Cryptographic Algorithm -
601
: URL Redirection to Untrusted Site -
862
: Insecure Direct Object References
OWASP (-f
or --flaw
option)
-
XSS
: Cross-site Scripting -
IDOR
: Insecure Direct Object Reference -
Injection
: Injection (SQL, LDAP, XPATH, OS Command, Code) -
URF
: URL Redirects and Forwards -
SM
: Security Misconfiguration -
SDE
: Sensitive Data Exposure
NODEJS - Available Generation
Note: Only XSS generation vulnerable and non-vulnerable Node.js sample files are available. The other CWE and OWASP generations, like PHP, are not implemented yet.
CWEs (-c
or --cwe
option)
-
79
: XSS
OWASP (-f
or --flaw
option)
-
XSS
: Cross-site Scripting
Hop.js - Available Generation
Note: Only XSS generation vulnerable and non-vulnerable Hop.js sample files are available. The other CWE and OWASP generations, like PHP, are not publisher yet.
CWEs (-c
or --cwe
option)
-
79
: XSS
OWASP (-f
or --flaw
option)
-
XSS
: Cross-site Scripting
Databases structure folders
The safe and unsafe folders will contain respectively safe files and unsafe files.
All the files are generated to have a unique name that reflects the program's content. In this way, you can target easily the different files that you search. Each file name follows this format :
CWE_XX_[(Input1)(Input2)…]_[(Sanitize1)(Sanitize2)…]_[(Construction1)(Construction2)…]
+-- classified-dbs
| +-- PHP-Database_MM-DD-YYYY_HHhMMmSS
| | +-- XSS
| | | +-- safe
| | | +-- CWE_XX_[(Input1)(Input2)…]_[(Sanitize1)(Sanitize2)…]_[(Construction1)(Construction2)…]
| | | +-- CWE_XX_[(Input1)(Input2)…]_[(Sanitize1)(Sanitize2)…]_[(Construction1)(Construction2)…]
| | | +-- unsafe
| | | +-- CWE_XX_[(Input1)(Input2)…]_[(Sanitize1)(Sanitize2)…]_[(Construction1)(Construction2)…]
| +-- NODEJS-Database_MM-DD-YYYY_HHhMMmSS
| | +-- XSS
| | | +-- safe
| | | +-- unsafe
| +-- HOPJS-Database_MM-DD-YYYY_HHhMMmSS
| | +-- XSS
| | | +-- safe
| | | +-- unsafe
Complexity generation
Note : The complexity generation is only available for PHP.
Overview
To generate all the samples, the generator uses three kinds of XML files :
-
input_lang.xml
: list of input template sources to collect the user data -
sanitize_lang.xml
: list of sanitization template codes to clean the input from malicious users -
construction_lang.xml
: list of construction template requests. For XSS, it's the list of HTML contexts
The generator uses the output.xml
to construct the files with the samples from the three XML files.
A basic output.xml
file :
<?xml version="1.0"?>
<program>
<input/>
<sanitize/>
<construction/>
</program>
You can surround each template with a decorator. It can be used for sanitize, input and construction. For example, if you want to insert the sanitization in a class, you can do it by modifying the output.xml file to :
<?xml version="1.0"?>
<program>
<input/>
<complexity type="class">
<sanitize/>
</complexity>
<construction/>
</program>
You can also construct it recursively, for example you can write the sanitization in a class and it will be in a separate file as illustrated in this example :
<?xml version="1.0"?>
<program>
<input/>
<complexity type="file">
<complexity type="class">
<sanitize/>
</complexity>
</complexity>
<construction/>
</program>
All the complexity generator features available
Note: The complexity generation list below is not available yet for the Node.js language.
<complexity type="class"> </complexity>
<complexity type="loop" kind="for"> </complexity>
<complexity type="loop" kind="while"> </complexity>
<complexity type="if"> </complexity>
<complexity type="file"> </complexity>
<complexity type="function"> </complexity>
Manifest.xml
List all the file generated by the generator by describing each generated sample with
- meta-data with the user input used
<input>file : /tmp/tainted.txt</input>
-
<file path="CWE_98/unsafe/CWE_98_[(backticks)]_[(func_preg_match) (no_filtering)]_[(include_file_name)(concatenation_simple_quote)].php" language="PHP">
sample path and its language -
<flaw line="62" name ="XSS"/>
the line of the sink and its vulnerability type
Acknowledgment
We thank SAMATE project at NIST, Bertrand Stivalet, Aurelien Delaitre, Guillaume Pighi, Jonathan Retterer and Xavier Marchal and all the contributors who providing PHP Vulnerability Test Suite that is the foundation of this study.