Mentions légales du service

Skip to content

[refactoring] make a base class for SNPSample and make it a pluggable

E. Madison Bray requested to merge embray/pluggable-datasource into master

This has been on my TODO list for a while and is a necessary step to more refactoring of the the configuration format

This does not go all the way to making the SNPSource completely replaceable via plugins; for the time being the "dnadna format" (hierarchy of .npz files) is still hard-coded as the only one supported during pre-processing/training

This does lay the groundwork for allowing users to easily plug in their own data source format; it will not require many more changes. but in the interest of getting version 1.0 finished I've left making this fully configurable as a later exercise

One major API change to NpzSNPSource and all other SNPSource classes is that you know longer call them to retrieve a sample, like source(0, 0); instead it just uses normal indexing brackets like source[0, 0]. I figured this was probably less confusing, and is closer to how pytorch Datasets work.

Although it's not obvious how, this is laying some groundwork for the new config formats discussed in #68 (closed).

Merge request reports