Reproducibility of the training

Maja spotted an issue in the reproducibility of the training results. This is a known issue in the DNN training. 
I would consider the following options to try to cure it (as suggested in https://keras.io/getting_started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development)

-try to run the script with this method 
`CUDA_VISIBLE_DEVICES="" PYTHONHASHSEED=0  python steer_analysis.py`
This reduces the randomness in the parallelization. It also implies a very slow training, so use something like 200-300 events for the training only and just 10 epochs. 

- Set the random seed in every code you use np and at the very beginning of the macro, before you initialize or load any keras, or TF 

```
import os
import sys
import random
from array import array
import numpy as np
np.random.seed(1001)
import matplotlib

```
Also make sure that the same seed is not initialized anywhere else in the code. Be aware that initializing np.random.seed()
doesn't initialize e.g. the "random" object in case it is used. What i find cleaner is to use the same object for the randomization everywhere

- Use the option shuffle=False in the function:

`model.fit(shuffle=False) 
`

I managed to get consistent and stable results for this version of the code https://github.com/AliceO2Group/TPCwithDNN/commit/33d79452472679ca1909585c428935ff3edf3474 with the modifications described above. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reproducibility of the training #26

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reproducibility of the training #26

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions