-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Maja spotted an issue in the reproducibility of the training results. This is a known issue in the DNN training.
I would consider the following options to try to cure it (as suggested in https://keras.io/getting_started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development)
-try to run the script with this method
CUDA_VISIBLE_DEVICES="" PYTHONHASHSEED=0 python steer_analysis.py
This reduces the randomness in the parallelization. It also implies a very slow training, so use something like 200-300 events for the training only and just 10 epochs.
- Set the random seed in every code you use np and at the very beginning of the macro, before you initialize or load any keras, or TF
import os
import sys
import random
from array import array
import numpy as np
np.random.seed(1001)
import matplotlib
Also make sure that the same seed is not initialized anywhere else in the code. Be aware that initializing np.random.seed()
doesn't initialize e.g. the "random" object in case it is used. What i find cleaner is to use the same object for the randomization everywhere
- Use the option shuffle=False in the function:
model.fit(shuffle=False)
I managed to get consistent and stable results for this version of the code 33d7945 with the modifications described above.