Declearn-text

MVP : Be able to run a first full example fine tuning an existing BERT with HuggingFace on a simple task

What needs to change :

The dataset API essentially :
- Accept data that is not tabular
- Allow for preprocessing steps, including at inference > possibly will require to also modify the model API
The split util could also be revamped, but that is lower priority

Approach : rely as much as possible on tools developed by the main frameworks

Rationale : data processing is complex and not the priority of our library, so we want to delegate as much as possible the work to other tools, in a way that is robust to future change as possible
So we essentially use the same approach as the vector API, but try to make it more minimal
TensorFlow and Torch provide with very practical tools to interface data> integrate those into a lightweight Dataset subclasses

Allow for HF full integration ?

MVP todo:

Edited May 23, 2023 by BIGAUD Nathan

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Admin message