refactor(mypy): un-ignore trainers/fabric (ratchet complete) by lyskov-ai · Pull Request #296 · RosettaCommons/foundry

lyskov-ai · 2026-06-03T04:28:14Z

Type-checking: fix the type errors in foundry.trainers.fabric (the last and largest module on the list) and remove the now-empty [[tool.mypy.overrides]] block (1 → 0 remaining) — all of src/foundry + src/foundry_cli now type-checks with no per-module exemptions. No behavior change.

Most errors stemmed from one root: self.state, a heterogeneous, dynamically-keyed bag, was inferred as dict[str, dict | int | None]. Fixed with a single class-level state: dict[str, Any] annotation — a TypedDict was rejected because the state is built dynamically (copy/setdefault/update) and merged with arbitrary checkpoint keys. The rest are local narrowing / cast fixes on the dataloader and precision paths.

Per-module details are in the commit message.

@wraps

Fix each module's single pre-existing type error with a pure annotation or setattr change (no behavior change) and remove it from the [[tool.mypy.overrides]] ignore_errors list: - callbacks/train_logging: loss_trackers: dict[str, MeanMetric] - callbacks/metrics_logging: seen_examples: set[str] - common: setattr(wrapper, "_has_run", True) for the @wraps wrapper - hydra/resolvers: attribute_path: str | None (body already guards) - inference_engines/base: base_overrides: dict[str, Any] 13 modules remain on the ignore list. mypy now type-checks the 5 newly-included modules cleanly. Co-authored-by: Sergey Lyskov <sergey.lyskov@jhu.edu>

@staticmethod

Resolve the type errors in and remove from the [[tool.mypy.overrides]] ignore_errors list. Mostly narrowing / annotation fixes; two deliberate type-honesty fixes flagged below. - utils/weights: lowercase `any` -> `Any` in _PatternPolicyMixin (4x); assert-narrow fallback_policy at the call site (matches get_policy idiom) - model/layers/blocks: class-level w/b: torch.Tensor for the registered buffers (avoids nn.Module's Tensor | Module __getattr__ fallback) - utils/components: is-None narrowing + tip_names local in get_name_mask's TIP branch (exists() can't narrow for mypy); drop orphaned exists import - utils/logging: str(field) for the tree key; assign to a new hparams local rather than reassigning the typed cfg param - foundry_cli/download_checkpoints: guard on `hasher is not None`; total_size = 0.0 for the float accumulation - training/schedulers: SchedulerConfig.scheduler is now a required field (was = None, but documented required and assumed non-None everywhere) - utils/xpu/xpu_accelerator: name @Property -> @staticmethod to match lightning's Accelerator ABC 6 hard-tier modules remain on the ignore list. Co-authored-by: Sergey Lyskov <sergey.lyskov@jhu.edu>

Fix the 11 type errors in foundry.metrics.metric and remove it from the [tool.mypy.overrides] ignore_errors list (5 hard-tier modules remain). - str(name) coercion of DictConfig.items() keys (str|bytes|int|... union) - exists() -> 'is not None' narrowing; drop orphaned atomworks import - widen compute_from_kwargs -> dict|list and kwargs_to_compute_args -> dict|None to match the actual returns / documented contract (callers already handle them) - three type: ignore[arg-type] on nested_dict.get/getitem for an upstream atomworks annotation bug (param typed dict[tuple,...] but navigated as nested dict[str,Any]); warn_unused_ignores will flag them if upstream is fixed No behavior change. All gates green (ruff, mypy 41 files, pytest 27 passed). Co-authored-by: Sergey Lyskov <sergey.lyskov@jhu.edu>

Clear the three remaining foundry.utils.* modules off the mypy ignore_errors list (47 errors: ddp 12, rigid 16, datasets 19). Type-honesty and annotation fixes only, no behavior change: narrow DictConfig|dict params to DictConfig where attribute access requires it (item access kept where a plain-dict default is real), honest int|None / Tensor|None widenings, variable renames to avoid type-reuse, str() coercion of DictConfig keys, the file's own if/elif/else narrowing pattern, and documented type: ignore / cast for genuine torch and atomworks stub limitations. Two hard-tier modules remain (callbacks/health_logging, trainers/fabric). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Clear foundry.callbacks.health_logging off the mypy ignore_errors list by fixing its 23 type errors (annotation / type-honesty only, no behavior change): - import the stdlib 'types' module directly instead of relying on 'from typing import types' (worked at runtime but fragile/untyped) - replace 'callable'-used-as-a-type with Mapping[str, Callable[..., Any]] on the stat/histogram dict params and Callable[..., bool] | None on the filter params; annotate the two MappingProxyType default constants to match - annotate the _hooks / _temp_cache / _cache instance vars - make implicit-Optional defaults explicit (... | None) on the two plot_tensor_* helpers, matching their is-not-None guards - in plot_tensor_hist, replace two type-changing param reassignments with equivalent always-set locals (display_values, step_labels) Only trainers/fabric remains on the ignore list. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Clear foundry.trainers.fabric (the last and largest module) off the mypy ignore_errors list and remove the now-empty override block. The ratchet ignore list is now empty: all of src/foundry + src/foundry_cli type-checks with no per-module exemptions. Fixes are annotation / type-honesty only, no behavior change: - annotate self.state as dict[str, Any] (a heterogeneous, dynamically- keyed training-state bag, also merged with arbitrary checkpoint keys); this collapses ~69 union-attr/operator/arg-type errors. Also annotate default_state and declare _current_train_return (set by subclass training_step implementations). - dataloader types: Fabric.setup_dataloaders is stub-typed to return DataLoader | list[DataLoader], so cast its single-loader results to DataLoader and change train_loop/validation_loop params from _FabricDataLoader to DataLoader (drop the now-unused import). - precision: widen the param to str | int | None (the body sets it None when an XPU plugin takes over), cast to the guarded Literal at the XPUMixedPrecision call, and add one documented type: ignore[arg-type] where our public API is wider than Fabric's precision Literal. - narrow the parameter-freezing guard to direct attribute access; type get_latest_checkpoint as Path | None (matching its returns) with a cast at the single caller; drop a stale type: ignore. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

lyskov-ai and others added 6 commits June 3, 2026 03:24

lyskov-ai requested a review from woodsh17 June 3, 2026 04:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(mypy): un-ignore trainers/fabric (ratchet complete)#296

refactor(mypy): un-ignore trainers/fabric (ratchet complete)#296
lyskov-ai wants to merge 6 commits into
RosettaCommons:productionfrom
lyskov-ai:0009-ratchet-mypy-fabric

lyskov-ai commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lyskov-ai commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants