As far as I understand the code, lines 152~161 in maxdiff.py, the entropy is computed over the MPPI rollout-ed states, which are represented as [t, t + horizon]. However, considering the simple 2D robot, if the robot keeps doing circular movement, without considering the states before t ( dubbed [0, t) ), the diffusion term is satisfied. But this is not the exploration behavior we want.