Why do we need device in order to initiate LearnedParams ?
Issue
It is mandatory to specify device
to get a LearnedParams
instance.
Showing Master Branch Code to prove that device is mandatory
Line 256 in training.py:
self._learned_params = LearnedParams(
self.config['learned_params'], simulation.scenario_params,
device=device, training_params=preprocessed_scenario_params,
validate=False)
Line 369 in training.py:
dataset = DNATrainingDataset(
simulation.data_source, self.learned_params,
training_series=training_series, **self.dataset_params)
I could be wrong, but It feels absurd/counter intuitive to me. Why would I need to put weights
parameters on GPU for the loss function before even using it in the training phase ? At this point, I do not even know if I might work on CPU or GPU that I need to specify it.
To be honest, I am not convinced about dnadna/params.py
. There are things that are called "parameters" but are in fact targets
. Even if those targets could be parameters of a mathematical model.
It looks like, we have different naming scheme. I think that a parameter is something we optimize, not something that are used to optimize our model:
-
nets.parameters()
are parameters -
targets
are not parameters
Solution
https://discuss.pytorch.org/t/move-the-loss-function-to-gpu/20060/6
=> Conclusion: We could remove it and put it in the training loop and not in the instantiation.
Other great solution: One of the best tutorial out there.
Pay attention to details in this tutorial, it is really well made. Especially the order of the code, and the natural, yet not said but easy to guess, split of deep learning files.
Question
This part is more of a complain, and not a question.
I will ask again:
- Is dnadna made for us in order to use a common clean and well thought code ?
- Or is it a blackbox for non-geek users ?
Most of the time, I get the first answer which I agree with, but the way it is coded actually is not really flexible and satisfying. It is more a blackbox for non-geek users than a clean code for us.
When I first arrived in the project, I thought that having a semi blackbox was a great idea, but I have changed my mind. I rather prefer have full control of whats going on in the code at the expense of more bugs or parameters to specify.
For instance, let me give example of non-flexibility:
- If I want to test, let's say 10 different networks, the data loader (for the exact same dataset) will have to reload each time (and it takes 2-5min for medium-big dataset)
- If I am 'debugging' a network, I have to load the entire dataset to test it on a single example (because
get_example_datum
is based on training_loader (`get_example_datum could load training_loader.dataset[0][1] itself without training_loader) - I could be wrong: It is not obvious how the data is loaded and it might be correlated with those 5min to load the entire dataset ? In my opinion, we should just have a mapping of index to NPZ file. (at the end, it is what it is done, but before it we load with pandas some big .csv files and might be the reason of these long loading times, I am wondering if it is mandatory).
- It is not saved anywhere any information (exact code / version e.t.c.) on a run (expect the .yml file). Hence, if someone else look at a run he can not guess what was exactly trained, or if someone forget, the information is completely lost.
- Training Loop is so much hard coded and not flexible: It is hard to add loss function the way it is coded. (and _compute_loss_metrics does things it should not do, like so much other stuff in dnadna).
These examples are pure user facts (global code behavior), and I did not get into the details (I would like to mention that, It can not be resolved by spamming tests
). But I think that the flexbility/"well-thoughtness" of the code's details is of the same kind.
I may be wrong on how to use dnadna
. And I would be very happy to get informed. In addition, I do not know how you use it, but I guess that my usage is not so different of yours.