Skip to content

Generalize the _Cum cumulative→incident transform: decouple it from neg_bin, make it explicit #418

@wshlavacek

Description

@wshlavacek

Context

The _Cum feature differences consecutive rows of a column (cumulative counts → per-interval incident counts) before scoring. It was added ad hoc for COVID case forecasting. Today it:

  • is welded to neg_bin_dynamic — it lives only in that objfunc's eval_point, so cumulative data can only be paired with NegBinomial noise. That coupling is an accident of history (NegBinomial was the non-negative model used in forecasting), not a principled tie.
  • is triggered implicitly by the substring _Cum appearing in a data-column name (if '_Cum' in col_name), a magic-string convention with no explicit declaration.

Why it's actually orthogonal

Cumulative→incident differencing is a data / prediction transform, independent of the noise family. Cumulative data could just as reasonably be compared with Gaussian or Laplace noise. Conceptually it belongs to how the prediction is formed from the simulation, not to the observation noise model.

Why this is not done in #410

Per-observable noise (#410, ADR-0021) deliberately keeps _Cum byte-exact and isolated (a neg_bin_dynamic-only prediction override, orthogonal to the (family × σ-source) map). Generalizing it there — making it fire for any family — would silently change chi_sq's behavior on any _Cum-named column, breaking #410's non-negotiable strict-superset backward-compatibility guarantee. So it was explicitly left for a follow-up.

Proposal

Lift _Cum into an explicit, opt-in, family-independent prediction transform:

Relationships

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions