Mentions légales du service

Skip to content

[bug][#33] fixes issue #33 and includes a regression test

E. Madison Bray requested to merge embray/issue-33 into master

As I wrote on the issue, it appears that a multi-process DataLoader is slightly buggy in PyTorch in that if one of the sub-processes returns an exception, when it tries to re-raise the exception in the main process it does so in such a way that assumes that all exceptions take only one argument, and that the first argument is a string.

This should be fixed upstream I think, but in the meantime we need to make sure that any exceptions that can be raised from a Dataset follow this (seemingly undocumented) requirement.

In writing the test for this I realized another problem: In order to track whether or not the elements of the dataset are of uniform size the Dataset makes a note of the shape of the first SNP it sees and compares future SNPs to that one. However, when using multiprocessing there is a small, but possible chance that this comparison always succeeds even for a non-unform dataset:

For example, say we have two workers, and the dataset is divided up evenly between the two workers with one getting odd SNPs and the other getting even. If the odd SNPs are all the same shape and the even SNPs are all the same shape but a different shape from the odd ones, then neither worker will ever see the conflict. Therefore we need to use a shared value to synchronize between all workers.

After the shared value has been set we can replace it with its known value, because it only needs to be set once, and after that no further synchronization is needed.

Merge request reports