Question regarding the use of SNPsample.validate
-
Question @embray we had a discussion with @pjobic about the fact that each time we call
copy_with
or in factSNPsample(...)
it calls the__init__
function which calls _validate, which means that the same datasets are going to be validated multiple times.
I think it's usage change a bit since the specific code he was referring to, because at the time there was a copy_with line in the__getitem__
function of DNATrainingDataset which is not there anymore. But still we find it the transforms.py For example, we are checking each time that the position length and snp dimension fit. In transforms.py we are calling copy_with or SNPsample() multiple time for example. Should we really validate each time, isn't a waste ? We already have tests for the function we coded -
Issue ? A linked issue with the validate function, if it is applied after each transforms is that it is too restrictive. For example it checks that pos >0 (and <1 if normalized). However, people might want to apply custom transforms that change this ? For example a classical transform is to pad with a given value, commonly with -1. So padding will make validate raise an issue (depending on the data format) ?
On Friday we saw that there was some padding going on, and it did not raise an issue (and the tensor had -1, not 255). It's because it wasn't applied during a transform but during collate_batch
(and at this moment dtype is changed to -1) so after all calls to _validate . So maybe padding wont be the main issue, if we keep it to the collate functions however one could think of other custom transforms ?
- I'll open an issue specific to the collate functions