Mentions légales du service

Skip to content

BERT model functionality

RENNER Joseph requested to merge bert into master

Adding BERT functionality, including WordPiece subword tokenizer (as well as other common sub word tokenizers), pretraining, finetuning and saving BERT models, loading and using pretrained BERT models.

  • Interface into huggingface tokenizers library for training new tokenizers
  • Interface into BERT models in transformers library. For each task class: initialization, saving and loading, training for associated tasks, extracting outputs for specific task, task prediction, unit tests and integration tests as needed
    • Base class for extracting features (aka hidden states) from any BERT task model.
    • BERT pretraining class
    • BERT masked language model class
    • BERT sequence classification class
    • BERT token classification class optional (for now):
    • BERT multiple choice class
    • BERT question answering class
    • BERT co-reference fine-tuning class
  • Add custom dataset and collator functionality for training
  • Generalize interface for any Transformer model
  • Documentation
    • API docs
    • Use cases
    • Code snippet usage examples
    • Update readme
  • Passing all tests
  • Developer documentation
Edited by RENNER Joseph

Merge request reports