Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization
This repository contains the Python implementation of the speech enhancement method proposed in the paper whose reference is below. We provide:
- the Keras implementation for training the supervised speech model, which is based on a variational autoencoder;
- the Keras models trained on the TIMIT database;
- the implementation of the proposed Monte Carlo expectation-maximization algorithm for performing speech enhancement.
Reference
Title: Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization
Authors: Simon Leglaive, Laurent Girin, Radu Horaud
Conference: 2019 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP)
Article: here
Bibtex: here
Demos
Audio examples are available here.
Repository Content
Root directory
- VAE.py - Contains classes related to variational autoencoders, with several methods such as for training, encoding, decoding, etc.
- training_main_file.py - Main script for training the variational autoencoder.
- data_tools.py - Contains functions for computing the training data.
- MCEM_algo.py - Monte Carlo expectation-maximization algorithm.
- speech_enhancement_main_file.py - Main script for enhancing a noisy speech signal.
- utils.py - Functions used in the MCEM algorithm for solving algebraic Riccati equations and computing efficiently some matrix operations (multiplication, determinant, trace and inverse).
- test_dataset_info.csv - CSV file describing how the 168 noisy mixtures used in the evaluation can be created from the TIMIT and DEMAND databases. The multichannel speech signals were created by simply delaying one channel with respect to the other one, according to a given direction of arrival and assuming a free field propagation. The recording setup is illustrated in 'recording_setup.txt'.
training_results
Each subfolder corresponds to a different choice for the dimension of the latent random vector involved in the variational autoencoder (8, 16, 32, 64 or 128).
- saved_weights.h5 - Weights of the network after training.
- parameters.txt - Network and training parameters in a text file (see training_main_file.py and data_tools.py).
- parameters.pckl - Network and training parameters in a pickle file (see training_main_file.py and data_tools.py).
Conda Requirements
Please refer to the two YAML files containing the conda environements used for training ('conda-environment-gpu-training.yml') and testing ('conda-environment-test.yml').
License
See LICENSE.txt