Context
The _Cum feature differences consecutive rows of a column (cumulative counts → per-interval incident counts) before scoring. It was added ad hoc for COVID case forecasting. Today it:
- is welded to
neg_bin_dynamic — it lives only in that objfunc's eval_point, so cumulative data can only be paired with NegBinomial noise. That coupling is an accident of history (NegBinomial was the non-negative model used in forecasting), not a principled tie.
- is triggered implicitly by the substring
_Cum appearing in a data-column name (if '_Cum' in col_name), a magic-string convention with no explicit declaration.
Why it's actually orthogonal
Cumulative→incident differencing is a data / prediction transform, independent of the noise family. Cumulative data could just as reasonably be compared with Gaussian or Laplace noise. Conceptually it belongs to how the prediction is formed from the simulation, not to the observation noise model.
Why this is not done in #410
Per-observable noise (#410, ADR-0021) deliberately keeps _Cum byte-exact and isolated (a neg_bin_dynamic-only prediction override, orthogonal to the (family × σ-source) map). Generalizing it there — making it fire for any family — would silently change chi_sq's behavior on any _Cum-named column, breaking #410's non-negotiable strict-superset backward-compatibility guarantee. So it was explicitly left for a follow-up.
Proposal
Lift _Cum into an explicit, opt-in, family-independent prediction transform:
Relationships
Context
The
_Cumfeature differences consecutive rows of a column (cumulative counts → per-interval incident counts) before scoring. It was added ad hoc for COVID case forecasting. Today it:neg_bin_dynamic— it lives only in that objfunc'seval_point, so cumulative data can only be paired with NegBinomial noise. That coupling is an accident of history (NegBinomial was the non-negative model used in forecasting), not a principled tie._Cumappearing in a data-column name (if '_Cum' in col_name), a magic-string convention with no explicit declaration.Why it's actually orthogonal
Cumulative→incident differencing is a data / prediction transform, independent of the noise family. Cumulative data could just as reasonably be compared with Gaussian or Laplace noise. Conceptually it belongs to how the prediction is formed from the simulation, not to the observation noise model.
Why this is not done in #410
Per-observable noise (#410, ADR-0021) deliberately keeps
_Cumbyte-exact and isolated (aneg_bin_dynamic-only prediction override, orthogonal to the(family × σ-source)map). Generalizing it there — making it fire for any family — would silently changechi_sq's behavior on any_Cum-named column, breaking #410's non-negotiable strict-superset backward-compatibility guarantee. So it was explicitly left for a follow-up.Proposal
Lift
_Cuminto an explicit, opt-in, family-independent prediction transform:noise_modelconfig from Per-observable noise models: lift the single globalobjfuncto per-observable selection (PEtab v2 prerequisite) #410/ADR-0021 — e.g. acumulativeflag/transform field, set independently of the noise family._Cum-substring trigger with that explicit declaration.neg_bin_dynamic+_Cumconfigs keep working (recognize the legacy_Cumcolumn-name convention as the opt-in, or document a one-line migration).Relationships
objfuncto per-observable selection (PEtab v2 prerequisite) #410's per-observable engine (ADR-0021), which is where the explicit transform would be surfaced.objfuncto per-observable selection (PEtab v2 prerequisite) #410.