Mentions légales du service

Skip to content
Snippets Groups Projects
Simon Leglaive's avatar
Simon Leglaive authored
d3289f80
History

Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization

This repository contains the Python implementation of the speech enhancement method proposed in the paper whose reference is below. We provide:

  • the Keras implementation for training the supervised speech model, which is based on a variational autoencoder;
  • the Keras models trained on the TIMIT database;
  • the implementation of the proposed Monte Carlo expectation-maximization algorithm for performing speech enhancement.

Reference

Title: Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization

Authors: Simon Leglaive, Laurent Girin, Radu Horaud

Conference: 2019 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP)

Article: here

Bibtex: here

Demos

Audio examples are available here.

Repository Content

Root directory

  • VAE.py - Contains classes related to variational autoencoders, with several methods such as for training, encoding, decoding, etc.
  • training_main_file.py - Main script for training the variational autoencoder.
  • data_tools.py - Contains functions for computing the training data.
  • MCEM_algo.py - Monte Carlo expectation-maximization algorithm.
  • speech_enhancement_main_file.py - Main script for enhancing a noisy speech signal.
  • utils.py - Functions used in the MCEM algorithm for solving algebraic Riccati equations and computing efficiently some matrix operations (multiplication, determinant, trace and inverse).
  • test_dataset_info.csv - CSV file describing how the 168 noisy mixtures used in the evaluation can be created from the TIMIT and DEMAND databases. The multichannel speech signals were created by simply delaying one channel with respect to the other one, according to a given direction of arrival and assuming a free field propagation. The recording setup is illustrated in 'recording_setup.txt'.

training_results

Each subfolder corresponds to a different choice for the dimension of the latent random vector involved in the variational autoencoder (8, 16, 32, 64 or 128).

  • saved_weights.h5 - Weights of the network after training.
  • parameters.txt - Network and training parameters in a text file (see training_main_file.py and data_tools.py).
  • parameters.pckl - Network and training parameters in a pickle file (see training_main_file.py and data_tools.py).

Conda Requirements

Please refer to the two YAML files containing the conda environements used for training ('conda-environment-gpu-training.yml') and testing ('conda-environment-test.yml').

License

See LICENSE.txt