Mentions légales du service

Skip to content
Snippets Groups Projects

Vulnerability database generator

The Vulnerability database generator produces vulnerable and fixed synthetic samples expressing web vulnerability flaws.

This repository is the official implementation of this approach described in:

Comparing the Detection of {XSS} Vulnerabilities in Node.js and a Multi-tier JavaScript-based Language via Deep Learning, Héloïse Maurel, Santiago Vidal and Tamara Rezk, In Proceedings of ICISSP 2021 PDF(https://hal.inria.fr/hal-03273564)

_February 2022 - The paper was accepted to ICISSP 2021

Citation

Comparing the Detection of {XSS} Vulnerabilities in Node.js and a Multi-tier JavaScript-based Language via Deep Learning, Héloïse Maurel, Santiago Vidal and Tamara Rezk, In Proceedings of ICISSP 2021

  • Online at hal.inria : PDF

  • Citation in .bibTex format :

      	@inproceedings{DBLP:conf/icissp/MaurelVR22,
      		  author    = {H{\'{e}}lo{\'{\i}}se Maurel and
      		               Santiago A. Vidal and
      		               Tamara Rezk},
      		  editor    = {Paolo Mori and
      		               Gabriele Lenzini and
      		               Steven Furnell},
      		  title     = {Comparing the Detection of {XSS} Vulnerabilities in Node.js and a
      		               Multi-tier JavaScript-based Language via Deep Learning},
      		  booktitle = {Proceedings of the 8th International Conference on Information Systems
      		               Security and Privacy, {ICISSP} 2022, Online Streaming, February 9-11,
      		               2022},
      		  pages     = {189--201},
      		  publisher = {{SCITEPRESS}},
      		  year      = {2022},
      		  url       = {https://doi.org/10.5220/0010980800003120},
      		  doi       = {10.5220/0010980800003120},
      		  timestamp = {Wed, 16 Mar 2022 11:05:48 +0100},
      		  biburl    = {https://dblp.org/rec/conf/icissp/MaurelVR22.bib},
      		  bibsource = {dblp computer science bibliography, https://dblp.org}
      		}

Overview

One of the essential steps to apply any supervised deep learning algorithm is to design a reliable and comprehensive dataset. In our case, the server-side code cannot be obtained by browsing the web, and it is difficult to reliably and automatically classify the server-side code on public repositories, like XSS-safe or unsafe. Thus, we explore the use of a synthetic generator of vulnerabilities for XSS flaws.

Hop.js - a multi-tier JavaScript-based Language

  • This generator can build sample for Hop.js version 3.5.0 (January 2022)
  • To have more information about this language -- See the official website hop.inria.fr

Prerequisites

  • Linux (developed on Ubuntu and Fedora)
  • Python >= 3.3.3 (developed on 3.9)

A Python installation is needed to run the generator.

Supported languages

This project currently supports PHP and Node.js (with HTML and javascript) as language input.

Quickstart

Step 0: Cloning this repository

git clone https://gitlab.inria.fr/deep-learning-applied-on-web-and-iot-security/statically-identifying-xss-using-deep-learning/concatenation-detector
cd vulnerability-generator-database/

Step 1: Creating a new database from PHP or Node.js generator

To have a dataset to train a neural network on, you can use this extended generator.

All the database generations will be build on a root folder called classified-dbs.

Generate Hop.js database

Those commands will generate XSS vulnerable and non-vulnerable Node.js sample files in a directory called NODEJS-Database_MM-DD-YYYY_HHhMMmSS inside the classified-dbs root folder.

python GeneratorLauncher.py --flaw=XSS --language=hopjs
python GeneratorLauncher.py -f=XSS --l=hopjs

Generate PHP database

Those commands will generate XSS vulnerable and non-vulnerable PHP sample files in a directory called PHP-Database_MM-DD-YYYY_HHhMMmSS inside the classified-dbs root folder.

python GeneratorLauncher.py --flaw=XSS --language=php
python GeneratorLauncher.py -f=XSS --l=php

To construct the initial distribution :

python  GeneratorLauncher.py -l php -f XSS

To construct the mismatching distribution with only rule 3 & 4 :

python  GeneratorLauncher.py -l php -n True -f XSS

To construct the mismatching distribution with only rule 0,1,2&5 :

python  GeneratorLauncher.py -l php -m True -f XSS

Generate Node.js database

Those commands will generate XSS vulnerable and non-vulnerable Node.js sample files in a directory called NODEJS-Database_MM-DD-YYYY_HHhMMmSS inside the classified-dbs root folder.

python GeneratorLauncher.py --flaw=XSS --language=nodejs
python GeneratorLauncher.py -f=XSS --l=nodejs

To construct the initial distribution associated to php :

python  GeneratorLauncher.py -l nodejs -v 1 -f XSS

To construct the mismatching distribution with only rule 3 & 4 related to php:

python  GeneratorLauncher.py -l nodejs -v 1 -n True -f XSS

To construct the mismatching distribution with only rule 0,1,2&5 related to php:

python  GeneratorLauncher.py -l nodejs -v 1 -m True -f XSS

Generator usage Examples

Note: If you don't specify the --language or -l flag, the database will be generated in PHP language by default.

For Hop.js generator

python GeneratorLauncher.py -l hopjs -f XSS
python GeneratorLauncher.py -l hopjs -c 79

For Node.js generator

python GeneratorLauncher.py -l nodejs -f XSS
python GeneratorLauncher.py -l nodejs -c 79

For PHP generator

  • Show command-line flags available
python GeneratorLauncher.py -h
  • Generate specific type of flaws
python GeneratorLauncher.py -l php -f XSS,Injection 
python GeneratorLauncher.py -l php --flaw=XSS,IDOR
  • Generate specific type of CWE
python GeneratorLauncher.py -l php -c 79
python GeneratorLauncher.py -l php --cwe=78,89,90,91

Available Generation

Note: We fixed 95 XSS classification errors by correcting, adding, and combining predicate attributes describing each sanitization template according to the OWASP rules recommendations to sanitize the HTML templates safely. We also extended the PHP generator with 25 sink templates, 16 XSS inputs, and 58 different proper/improper sanitizations.

PHP - Available Generation

CWEs (-c or --cweoption)

  • 78 : Command OS Injection
  • 79 : XSS
  • 89 : SQL Injection
  • 90 : LDAP Injection
  • 91 : XPath Injection
  • 95 : Code Injection
  • 98 : File Injection
  • 209 : Information Exposure Through an Error Message
  • 311 : Missing Encryption of Sensitive Data
  • 327 : Use of a Broken or Risky Cryptographic Algorithm
  • 601 : URL Redirection to Untrusted Site
  • 862 : Insecure Direct Object References

OWASP (-f or --flawoption)

  • XSS : Cross-site Scripting
  • IDOR : Insecure Direct Object Reference
  • Injection : Injection (SQL, LDAP, XPATH, OS Command, Code)
  • URF : URL Redirects and Forwards
  • SM : Security Misconfiguration
  • SDE : Sensitive Data Exposure

NODEJS - Available Generation

Note: Only XSS generation vulnerable and non-vulnerable Node.js sample files are available. The other CWE and OWASP generations, like PHP, are not implemented yet.

CWEs (-c or --cwe option)

  • 79 : XSS

OWASP (-f or --flaw option)

  • XSS : Cross-site Scripting

Hop.js - Available Generation

Note: Only XSS generation vulnerable and non-vulnerable Hop.js sample files are available. The other CWE and OWASP generations, like PHP, are not publisher yet.

CWEs (-c or --cwe option)

  • 79 : XSS

OWASP (-f or --flaw option)

  • XSS : Cross-site Scripting

Databases structure folders

The safe and unsafe folders will contain respectively safe files and unsafe files. All the files are generated to have a unique name that reflects the program's content. In this way, you can target easily the different files that you search. Each file name follows this format : CWE_XX_[(Input1)(Input2)…]_[(Sanitize1)(Sanitize2)…]_[(Construction1)(Construction2)…]

+-- classified-dbs
|   +-- PHP-Database_MM-DD-YYYY_HHhMMmSS
|   |   +-- XSS   
|   |   |   +-- safe
|   |   |       +-- CWE_XX_[(Input1)(Input2)…]_[(Sanitize1)(Sanitize2)…]_[(Construction1)(Construction2)…]
|   |   |       +-- CWE_XX_[(Input1)(Input2)…]_[(Sanitize1)(Sanitize2)…]_[(Construction1)(Construction2)…]
|   |   |   +-- unsafe
|   |   |       +-- CWE_XX_[(Input1)(Input2)…]_[(Sanitize1)(Sanitize2)…]_[(Construction1)(Construction2)…]
|   +-- NODEJS-Database_MM-DD-YYYY_HHhMMmSS
|   |    +-- XSS   
|   |   |   +-- safe
|   |   |   +-- unsafe
|   +-- HOPJS-Database_MM-DD-YYYY_HHhMMmSS
|   |    +-- XSS   
|   |   |   +-- safe
|   |   |   +-- unsafe

Complexity generation

Note : The complexity generation is only available for PHP.

Overview

To generate all the samples, the generator uses three kinds of XML files :

  • input_lang.xml: list of input template sources to collect the user data
  • sanitize_lang.xml: list of sanitization template codes to clean the input from malicious users
  • construction_lang.xml : list of construction template requests. For XSS, it's the list of HTML contexts

The generator uses the output.xml to construct the files with the samples from the three XML files.

A basic output.xml file :

<?xml version="1.0"?>
<program>
    <input/>
    <sanitize/>
    <construction/>
</program>

You can surround each template with a decorator. It can be used for sanitize, input and construction. For example, if you want to insert the sanitization in a class, you can do it by modifying the output.xml file to :

<?xml version="1.0"?>
<program>
    <input/>
    <complexity type="class">
        <sanitize/>
    </complexity>
    <construction/>
</program>

You can also construct it recursively, for example you can write the sanitization in a class and it will be in a separate file as illustrated in this example :

<?xml version="1.0"?>
<program>
    <input/>
    <complexity type="file">
        <complexity type="class">
            <sanitize/>
        </complexity>
    </complexity>
    <construction/>
</program>

All the complexity generator features available

Note: The complexity generation list below is not available yet for the Node.js language.

<complexity type="class"> </complexity>
<complexity type="loop" kind="for"> </complexity>
<complexity type="loop" kind="while"> </complexity>
<complexity type="if"> </complexity>
<complexity type="file"> </complexity>
<complexity type="function"> </complexity>

Manifest.xml

List all the file generated by the generator by describing each generated sample with

  • meta-data with the user input used <input>file : /tmp/tainted.txt</input>
  • <file path="CWE_98/unsafe/CWE_98_[(backticks)]_[(func_preg_match) (no_filtering)]_[(include_file_name)(concatenation_simple_quote)].php" language="PHP"> sample path and its language
  • <flaw line="62" name ="XSS"/> the line of the sink and its vulnerability type

Acknowledgment

We thank SAMATE project at NIST, Bertrand Stivalet, Aurelien Delaitre, Guillaume Pighi, Jonathan Retterer and Xavier Marchal and all the contributors who providing PHP Vulnerability Test Suite that is the foundation of this study.