Skip to content

Add dvx audit — blob classification, lineage, and cache analysis#9

Draft
ryan-williams wants to merge 2 commits intomainfrom
audit
Draft

Add dvx audit — blob classification, lineage, and cache analysis#9
ryan-williams wants to merge 2 commits intomainfrom
audit

Conversation

@ryan-williams
Copy link
Member

Summary

  • New dvx audit command: classifies workspace blobs as input/generated/foreign, checks cache status, detects orphans, outputs colored DOT graphs
  • Adds reproducible field to DVCFileInfo (from meta.reproducible); generated blobs default to reproducible unless opted out
  • Renames evictablereproducible in specs (positive default, opt-out via meta.reproducible: false)

Usage

dvx audit                          # workspace summary
dvx audit some/artifact            # per-artifact lineage
dvx audit --orphans                # find orphans
dvx audit --json                   # JSON output
dvx audit --graph | dot -Tsvg      # colored DAG
dvx audit -r myremote              # also check remote cache

Test plan

  • All 85 existing tests pass
  • Manual testing in repos with .dvc files
  • Test --orphans with actual orphaned cache entries
  • Test --graph output with graphviz
  • Test --json output consumed by UI

🤖 Generated with Claude Code

@ryan-williams ryan-williams force-pushed the audit branch 2 times, most recently from 791f22c to d4912b6 Compare March 10, 2026 01:19
New command classifies workspace blobs as input/generated/foreign,
checks local cache status, detects orphaned cache entries, and
outputs colored DOT dependency graphs. Adds `reproducible` field
to `DVCFileInfo` (read from `meta.reproducible`); blobs with
`computation` default to reproducible unless explicitly opted out.

Renames `evictable` → `reproducible` in specs (positive default,
opt-out via `meta.reproducible: false`).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduces `RepoView` protocol with two implementations:
- `FilesystemRepoView`: production, delegates to existing I/O
- `SnapshotRepoView`: loads from `dvc-contents.json` + `cache-files.txt`

Refactors `scan_workspace`, `audit_artifact`, `find_orphans` to accept
an optional `view` param. CLI gains `-S`/`--snapshot` for offline audit
against captured repo state.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant