Skip to content

Reproducibility of the training #26

@ginnocen

Description

@ginnocen

Maja spotted an issue in the reproducibility of the training results. This is a known issue in the DNN training.
I would consider the following options to try to cure it (as suggested in https://keras.io/getting_started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development)

-try to run the script with this method
CUDA_VISIBLE_DEVICES="" PYTHONHASHSEED=0 python steer_analysis.py
This reduces the randomness in the parallelization. It also implies a very slow training, so use something like 200-300 events for the training only and just 10 epochs.

  • Set the random seed in every code you use np and at the very beginning of the macro, before you initialize or load any keras, or TF
import os
import sys
import random
from array import array
import numpy as np
np.random.seed(1001)
import matplotlib

Also make sure that the same seed is not initialized anywhere else in the code. Be aware that initializing np.random.seed()
doesn't initialize e.g. the "random" object in case it is used. What i find cleaner is to use the same object for the randomization everywhere

  • Use the option shuffle=False in the function:

model.fit(shuffle=False)

I managed to get consistent and stable results for this version of the code 33d7945 with the modifications described above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions