feature: safetensors and merkle hashing implemented#91
Conversation
📝 WalkthroughWalkthroughThis PR adds shared artifact hashing and Merkle helpers, introduces safetensors checkpoint save/load support, updates training and manifest generation to record checkpoint artifact metadata, switches evaluation and telemetry to shared hashing utilities, and adds Merkle tests plus related dependency, ignore, and output-format updates. ChangesArtifact checkpoint and verification flow
Sequence Diagram(s)sequenceDiagram
participant Training as reproducibility.py
participant Artifacts as artifacts module
participant Weights as mid_checkpoint.safetensors
participant Merkle as mid_checkpoint.merkle.json
participant Checkpoint as mid_checkpoint.pt
participant Global as global_manifest.py
participant Eval as eval.py
Training->>Artifacts: save_model_safetensors(model)
Artifacts->>Weights: write weights state
Training->>Artifacts: write_merkle_manifest(weights)
Artifacts->>Merkle: write chunk hashes and merkle_root
Training->>Checkpoint: store hash, merkle_root, chunk_size
Global->>Checkpoint: read checkpoint metadata
Global->>Global: record artifact_id, merkle_root, chunk_size, chunk_count
Eval->>Artifacts: load_model_safetensors(model)
alt safetensors missing
Eval->>Checkpoint: torch.load fallback
end
Eval->>Eval: record model_checkpoint_source in manifest
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/artifacts.py`:
- Around line 82-101: In build_merkle_manifest, the chunk list is built from one
read pass but size_bytes and the file-level sha256 are computed afterward, so
the manifest can mix metadata from different file versions if the artifact
changes mid-read. Update the same read loop in artifacts.py to accumulate the
total size and a running full-file hash while building chunks, then use those
accumulated values for size_bytes and sha256 instead of calling path.stat() and
compute_sha256(file_path=path) after the scan.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6ec4fc20-07fc-4b74-aa4e-ada35bd5abeb
📒 Files selected for processing (9)
.gitignorerequirements.txtsrc/artifacts.pysrc/eval.pysrc/global_manifest.pysrc/gpu_reproducibility_test.pysrc/reproducibility.pysrc/telemetry.pytests/test_artifacts.py
|
For @Archit381: Accept all incoming changes, refactored some parts hence the conflict. |
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
Please make the following changes
|
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/reproducibility.py (1)
13-20:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winMissing
hashlibimport causes runtimeNameError.The code uses
hashlib.sha256()at multiple locations (lines 54, 137, 170, 212, 246), buthashlibis never imported. This will crash at runtime when any checkpoint loading function is called.🐛 Proposed fix: Add hashlib import
import torch import torch.nn.functional as F +import hashlib import json import math import random🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/reproducibility.py` around lines 13 - 20, The code uses hashlib.sha256() at multiple locations but the hashlib module is never imported, causing a runtime NameError. Add the hashlib import statement to the imports section at the top of the file, alongside the other standard library imports, before the artifacts import block shown in the diff.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@requirements.txt`:
- Line 1: The torch dependency in requirements.txt is pinned to version 2.10.0,
which is affected by two known vulnerabilities (GHSA-rrmf-rvhw-rf47 and
PYSEC-2026-139). Update the torch version specification from 2.10.0 to a version
above 2.12.0 to address both security advisories. This is a simple version
update in the requirements.txt file where torch is declared.
- Around line 1-3: Add the `safetensors` package to requirements.txt to satisfy
the runtime dependency. The codebase imports and uses safetensors.torch in
src/artifacts.py (specifically in the functions called from
src/reproducibility.py and src/eval.py for checkpoint operations), but the
package is missing from the requirements file. Add an entry for safetensors with
an appropriate version constraint to ensure model checkpoints can be written and
loaded successfully in clean environments.
In `@src/eval.py`:
- Around line 34-43: The variable checkpoint_source is referenced later in the
code but is never initialized in the checkpoint loading block. When loading the
checkpoint from mid_checkpoint.pt using torch.load, explicitly assign
checkpoint_source to a string value that identifies the artifact source (for
example, set it to "mid_checkpoint" or similar). This assignment must occur in
the checkpoint loading block so that checkpoint_source is defined before it is
used in manifest creation at line 73.
In `@src/reproducibility.py`:
- Around line 52-58: The file_hash variable computed in the checkpoint loading
section is never used for validation, logging, or comparison, making it dead
code. Either remove the file_hash computation entirely, or integrate it into a
verification workflow (such as comparing against a stored hash, logging it for
audit purposes, or validating tamper detection). Apply the same fix consistently
across all occurrences in the bad_seed_auditor, secret_noise_auditor,
sabotage_auditor, and broken_seal_auditor functions to ensure the code is
complete and the security measure is either properly implemented or cleanly
removed.
In `@src/telemetry.py`:
- Around line 1-4: The TelemetryLogger.hash_model method calls the undefined
symbol model_parameters_sha256, which is not imported at the top of the file,
causing a NameError at runtime. Add model_parameters_sha256 to the import
statements in the imports section (lines 1-4) so that it is available when
TelemetryLogger.hash_model attempts to use it at lines 36-37.
---
Outside diff comments:
In `@src/reproducibility.py`:
- Around line 13-20: The code uses hashlib.sha256() at multiple locations but
the hashlib module is never imported, causing a runtime NameError. Add the
hashlib import statement to the imports section at the top of the file,
alongside the other standard library imports, before the artifacts import block
shown in the diff.
🪄 Autofix (Beta)
❌ Autofix failed (check again to retry)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 28854447-0505-4a5c-a49d-13ed262af34f
📒 Files selected for processing (5)
requirements.txtsrc/eval.pysrc/global_manifest.pysrc/reproducibility.pysrc/telemetry.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/global_manifest.py
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/reproducibility.py (1)
13-20:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winMissing
hashlibimport causes runtimeNameError.The code uses
hashlib.sha256()at multiple locations (lines 54, 137, 170, 212, 246), buthashlibis never imported. This will crash at runtime when any checkpoint loading function is called.🐛 Proposed fix: Add hashlib import
import torch import torch.nn.functional as F +import hashlib import json import math import random🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/reproducibility.py` around lines 13 - 20, The code uses hashlib.sha256() at multiple locations but the hashlib module is never imported, causing a runtime NameError. Add the hashlib import statement to the imports section at the top of the file, alongside the other standard library imports, before the artifacts import block shown in the diff.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@requirements.txt`:
- Line 1: The torch dependency in requirements.txt is pinned to version 2.10.0,
which is affected by two known vulnerabilities (GHSA-rrmf-rvhw-rf47 and
PYSEC-2026-139). Update the torch version specification from 2.10.0 to a version
above 2.12.0 to address both security advisories. This is a simple version
update in the requirements.txt file where torch is declared.
- Around line 1-3: Add the `safetensors` package to requirements.txt to satisfy
the runtime dependency. The codebase imports and uses safetensors.torch in
src/artifacts.py (specifically in the functions called from
src/reproducibility.py and src/eval.py for checkpoint operations), but the
package is missing from the requirements file. Add an entry for safetensors with
an appropriate version constraint to ensure model checkpoints can be written and
loaded successfully in clean environments.
In `@src/eval.py`:
- Around line 34-43: The variable checkpoint_source is referenced later in the
code but is never initialized in the checkpoint loading block. When loading the
checkpoint from mid_checkpoint.pt using torch.load, explicitly assign
checkpoint_source to a string value that identifies the artifact source (for
example, set it to "mid_checkpoint" or similar). This assignment must occur in
the checkpoint loading block so that checkpoint_source is defined before it is
used in manifest creation at line 73.
In `@src/reproducibility.py`:
- Around line 52-58: The file_hash variable computed in the checkpoint loading
section is never used for validation, logging, or comparison, making it dead
code. Either remove the file_hash computation entirely, or integrate it into a
verification workflow (such as comparing against a stored hash, logging it for
audit purposes, or validating tamper detection). Apply the same fix consistently
across all occurrences in the bad_seed_auditor, secret_noise_auditor,
sabotage_auditor, and broken_seal_auditor functions to ensure the code is
complete and the security measure is either properly implemented or cleanly
removed.
In `@src/telemetry.py`:
- Around line 1-4: The TelemetryLogger.hash_model method calls the undefined
symbol model_parameters_sha256, which is not imported at the top of the file,
causing a NameError at runtime. Add model_parameters_sha256 to the import
statements in the imports section (lines 1-4) so that it is available when
TelemetryLogger.hash_model attempts to use it at lines 36-37.
---
Outside diff comments:
In `@src/reproducibility.py`:
- Around line 13-20: The code uses hashlib.sha256() at multiple locations but
the hashlib module is never imported, causing a runtime NameError. Add the
hashlib import statement to the imports section at the top of the file,
alongside the other standard library imports, before the artifacts import block
shown in the diff.
🪄 Autofix (Beta)
❌ Autofix failed (check again to retry)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 28854447-0505-4a5c-a49d-13ed262af34f
📒 Files selected for processing (5)
requirements.txtsrc/eval.pysrc/global_manifest.pysrc/reproducibility.pysrc/telemetry.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/global_manifest.py
🛑 Comments failed to post (5)
requirements.txt (2)
1-1:
⚠️ Potential issue | 🟠 Major🧩 Analysis chain
🏁 Script executed:
#!/bin/bash # Query OSV for known vulnerabilities affecting torch 2.10.0 curl -s https://api.osv.dev/v1/query \ -H 'Content-Type: application/json' \ -d '{"version":"2.10.0","package":{"name":"torch","ecosystem":"PyPI"}}' | jq . # Expected: advisories list confirms whether this exact version is affected and patched ranges.Repository: AOSSIE-Org/OpenVerifiableLLM
Length of output: 7025
Update
torchto address known vulnerabilities in version 2.10.0.torch==2.10.0 is affected by at least two known vulnerabilities:
- GHSA-rrmf-rvhw-rf47 (CVE-2025-3000): Memory corruption in
torch.jit.script(low severity, affects versions up to 2.12.0)- PYSEC-2026-139 (CVE-2026-4538): Deserialization vulnerability in pt2 Loading Handler (affects 2.10.0 specifically)
Upgrade to a version above 2.12.0 to address both advisories.
🧰 Tools
🪛 OSV Scanner (2.3.8)
[HIGH] 1-1: torch 2.10.0: undefined
(PYSEC-2026-139)
[HIGH] 1-1: torch 2.10.0: PyTorch is vulnerable to memory corruption through its torch.jit.script function
(GHSA-rrmf-rvhw-rf47)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@requirements.txt` at line 1, The torch dependency in requirements.txt is pinned to version 2.10.0, which is affected by two known vulnerabilities (GHSA-rrmf-rvhw-rf47 and PYSEC-2026-139). Update the torch version specification from 2.10.0 to a version above 2.12.0 to address both security advisories. This is a simple version update in the requirements.txt file where torch is declared.Source: Linters/SAST tools
1-3:
⚠️ Potential issue | 🟠 Major🧩 Analysis chain
🏁 Script executed:
#!/bin/bash # Verify safetensors runtime usage is backed by requirements.txt rg -n --type=py '\bsafetensors\b|save_model_safetensors|load_model_safetensors' rg -n '^safetensors([<>=!~].*)?$' requirements.txt # Expected: at least one safetensors requirement line in requirements.txtRepository: AOSSIE-Org/OpenVerifiableLLM
Length of output: 820
Add
safetensorstorequirements.txtto satisfy the runtime checkpoint contract.The codebase imports and uses
safetensors.torchinsrc/artifacts.py(lines 204, 227) and calls these functions fromsrc/reproducibility.pyandsrc/eval.py, but there is nosafetensorsentry inrequirements.txt. This will cause runtime failures when writing or loading model checkpoints in clean environments.Suggested fix
torch==2.10.0 numpy==2.4.3 tqdm==4.67.3 +safetensors>=0.4.5,<1.0📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.torch==2.10.0 numpy==2.4.3 tqdm==4.67.3 safetensors>=0.4.5,<1.0🧰 Tools
🪛 OSV Scanner (2.3.8)
[HIGH] 1-1: torch 2.10.0: undefined
(PYSEC-2026-139)
[HIGH] 1-1: torch 2.10.0: PyTorch is vulnerable to memory corruption through its torch.jit.script function
(GHSA-rrmf-rvhw-rf47)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@requirements.txt` around lines 1 - 3, Add the `safetensors` package to requirements.txt to satisfy the runtime dependency. The codebase imports and uses safetensors.torch in src/artifacts.py (specifically in the functions called from src/reproducibility.py and src/eval.py for checkpoint operations), but the package is missing from the requirements file. Add an entry for safetensors with an appropriate version constraint to ensure model checkpoints can be written and loaded successfully in clean environments.src/eval.py (1)
34-43:
⚠️ Potential issue | 🔴 Critical | ⚡ Quick win
checkpoint_sourceis used before assignment, causing a runtime crash.At Line 73,
checkpoint_sourceis undefined, so manifest creation fails withNameError. The load block should set it explicitly when choosing the artifact source.Suggested fix
- checkpoint_path = "mid_checkpoint.pt" - with open(checkpoint_path, "rb") as f: - file_hash = hashlib.sha256(f.read()).hexdigest() - - checkpoint = torch.load(checkpoint_path, weights_only=False, map_location=DEVICE) - model.load_state_dict(checkpoint['model']) + checkpoint = {} + if os.path.exists(CHECKPOINT_WEIGHTS_PATH): + load_model_safetensors(model, CHECKPOINT_WEIGHTS_PATH, device=DEVICE) + checkpoint_source = "safetensors" + else: + checkpoint_path = "mid_checkpoint.pt" + checkpoint = torch.load(checkpoint_path, weights_only=False, map_location=DEVICE) + model.load_state_dict(checkpoint["model"]) + checkpoint_source = "pt"Also applies to: 70-74
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/eval.py` around lines 34 - 43, The variable checkpoint_source is referenced later in the code but is never initialized in the checkpoint loading block. When loading the checkpoint from mid_checkpoint.pt using torch.load, explicitly assign checkpoint_source to a string value that identifies the artifact source (for example, set it to "mid_checkpoint" or similar). This assignment must occur in the checkpoint loading block so that checkpoint_source is defined before it is used in manifest creation at line 73.src/reproducibility.py (1)
52-58:
⚠️ Potential issue | 🟡 Minor | ⚡ Quick winComputed
file_hashis never used.The file-level hash is computed as a "security measure" but is never used for validation, logging, or comparison. This appears to be dead code or incomplete implementation. Either remove it or integrate it into the verification flow.
The same pattern repeats in
bad_seed_auditor(line 137),secret_noise_auditor(line 170),sabotage_auditor(line 212), andbroken_seal_auditor(line 246).🧰 Tools
🪛 Ruff (0.15.17)
[error] 54-54: Undefined name
hashlib(F821)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/reproducibility.py` around lines 52 - 58, The file_hash variable computed in the checkpoint loading section is never used for validation, logging, or comparison, making it dead code. Either remove the file_hash computation entirely, or integrate it into a verification workflow (such as comparing against a stored hash, logging it for audit purposes, or validating tamper detection). Apply the same fix consistently across all occurrences in the bad_seed_auditor, secret_noise_auditor, sabotage_auditor, and broken_seal_auditor functions to ensure the code is complete and the security measure is either properly implemented or cleanly removed.src/telemetry.py (1)
1-4:
⚠️ Potential issue | 🔴 Critical | ⚡ Quick win
TelemetryLogger.hash_modelcalls an undefined symbol.
model_parameters_sha256is not imported, so this path raisesNameErrorat runtime.Suggested fix
import json import torch import os import hashlib +from artifacts import model_parameters_sha256Also applies to: 36-37
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/telemetry.py` around lines 1 - 4, The TelemetryLogger.hash_model method calls the undefined symbol model_parameters_sha256, which is not imported at the top of the file, causing a NameError at runtime. Add model_parameters_sha256 to the import statements in the imports section (lines 1-4) so that it is available when TelemetryLogger.hash_model attempts to use it at lines 36-37.
Done with both. |
|
@coderabbitai autofix |
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. Autofix skipped. No unresolved CodeRabbit review comments with fix instructions found. |
|
Some of these are cve, to be fixed later. |
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. Autofix skipped. No unresolved CodeRabbit review comments with fix instructions found. |
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. Autofix skipped. No unresolved CodeRabbit review comments with fix instructions found. |
Extends #89
TLDR;
Implemented stable model artifact verification with safetensors and Merkle chunking.
Why this matters:
Moves the pipeline from reproducibility (same inputs, same outputs) to attestation (a compact, independently checkable proof that an artifact is what we claim). This is the Merkle sealing primitive in working form.
safetensors is the load-bearing choice: unlike pickle .pt, it's byte-canonical, so the artifact hash and Merkle root are stable across machines and runs. We keep .pt for RNG/optimizer replay, separating what we seal from what we resume.
Merkle chunking over a flat hash gives localized tamper attribution, inclusion proofs, and partial verification, scaling toward sharded checkpoints.
Canonical JSON plus recorded root/chunk-size/chunk-count makes the manifest a self-contained root of trust a verifier can check independently.
Consolidating hashing into artifacts.py ensures producer and verifier hash identically; tamper-fail tests confirm the harness catches changes, not just passes.
This layer assumes deterministic bytes; bitwise GPU repro produces them, this PR verifies them.
Changes:
src/artifacts.pyfor SHA-256 hashing, canonical JSON hashing, model parameter hashing, safetensors save/load, Merkle manifests, and Merkle proof verification.mid_checkpoint.safetensorsplusmid_checkpoint.merkle.json, while keeping.ptcheckpoints for RNG/optimizer replay..ptfallback.safetensorsdependency and Merkle unit tests.Validation:
git diff --checkpassed.AI Usage Disclosure:
I have used the following AI models and tools:
Used codex & coderabbit to add tests and perform validation as a pair programmer
Checklist
Summary by CodeRabbit
Release Notes
New Features
safetensors-based checkpoint save/load for stable, portable model serialization.Improvements
Tests
Chores
safetensorsdependency.