Incorrect handling of `shape` in `FederatedDataset` and other classes
Context:
For the MedicalFolderDataset
class, we allow the shape
field on TinyDB to be a dict with the following semantics {modality: shape_array}
. Note: usually the shape
is directly the shape_array
, not inserted into any collection.
Problem:
Currently, our FederatedDataset
does not handle this correctly. For example, the sample_sizes
function calls
sample_sizes.append(val[0]["shape"][0])
which will fail when shape
is a dict.
Bigger problem:
It's hard to decide how to solve this.
On the one hand, different modalities may have different numbers of samples and/or shapes. This means that the shape
field cannot be a single shape_array
value as it is for other data types.
On the other hand, the FederatedDataset
doesn't have any knowledge of the specifics of the experiment (e.g. which modality is used by the researcher).
Similar problem and possible solution (I don't like this solution)
I had the exact same problem in the Strategy
, and I solved it by creating a custom Strategy which takes the modality as a parameter, and has minimal differences w.r.t the original stragegy.
We could solve this issue with a similar approach:
- researcher inherits from FederatedDataset to create a custom one that knows about data modalities
- researcher provides this class to the
Experiment
, and in that moment they can also provide the correct modality
However, there are many reasons why this is a bad solution, but one obvious one is that the MedicaFolderDataset
is provided by us, so it should be better integrated with the rest of the classes in the library. It shouldn't require the researcher to write millions of lines of boilerplate.
Conclusion
I'm not sure what to suggest to solve this, help!