feat: trim heavy fields from raw_header before storage#37
Conversation
Strip dah (73%), validator_set (16%), and commit.signatures (12%) from Celestia ExtendedHeader JSON before persisting to SQLite. No consumer reads these fields — ev-node only extracts the timestamp. Reduces per-header storage from ~87KB to ~2KB (~98% reduction). Applied to all three ingestion paths: celestia-node JSON-RPC, celestia-app gRPC, and direct blockstore DB backfill. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review infoConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThese changes optimize storage by replacing full block headers with minimal JSON representations containing only essential fields (height, time, data hash, and commit information), reducing per-header storage footprint across the backfill and fetch modules. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
pkg/fetch/celestia_node_test.go (1)
234-285: Prefer table-driven structure for the new trim test scenarios.These cases are good, but Lines 234-285 can be consolidated into a table-driven test to match repo test conventions and make future trim cases cheaper to add.
As per coding guidelines:
**/*_test.go: "Use table-driven tests pattern for test cases".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/fetch/celestia_node_test.go` around lines 234 - 285, Convert the two separate tests into a single table-driven test that iterates over cases (e.g., name, input []byte, expectedPresent []string, expectedAbsent []string, expectUnchanged bool) and uses TrimRawHeader to produce trimmed output; for each case, unmarshal and assert presence/absence of top-level keys and nested commit fields (block_id present, signatures absent) for the valid header case and assert unchanged output for the invalid-JSON case; update or remove TestTrimRawHeader and TestTrimRawHeaderInvalidJSON and keep a reference to TrimRawHeader in the table-driven test so future scenarios can be added by appending cases.pkg/backfill/db/source.go (1)
732-735: Propagate marshal failure instead of returning nil silently.Line 733 currently drops serialization errors and can emit headers with empty
RawHeaderwithout signal.Proposed change
- RawHeader: buildMinimalRawHeader(height, block.Time, block.DataHash, meta.BlockHash), + rawHeader, err := buildMinimalRawHeader(height, block.Time, block.DataHash, meta.BlockHash) + if err != nil { + return nil, nil, fmt.Errorf("build minimal raw header at height %d: %w", height, err) + } + RawHeader: rawHeader,-func buildMinimalRawHeader(height uint64, t time.Time, dataHash, blockHash []byte) []byte { +func buildMinimalRawHeader(height uint64, t time.Time, dataHash, blockHash []byte) ([]byte, error) { obj := map[string]any{ "header": map[string]any{ "height": fmt.Sprintf("%d", height), "time": t.Format(time.RFC3339Nano), "data_hash": hex.EncodeToString(dataHash), }, "commit": map[string]any{ "height": fmt.Sprintf("%d", height), "block_id": map[string]any{ "hash": hex.EncodeToString(blockHash), }, }, } raw, err := json.Marshal(obj) if err != nil { - return nil + return nil, err } - return raw + return raw, nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/backfill/db/source.go` around lines 732 - 735, The current code swallows json.Marshal errors (raw, err := json.Marshal(obj)) and returns nil, which lets callers produce empty RawHeader; change the function to propagate the serialization error instead of returning nil silently—replace the bare "return nil" with returning the error (e.g., return nil, err or wrap it with context) and update the function signature/call sites if needed so callers handle the error and do not emit headers with empty RawHeader.pkg/fetch/celestia_app.go (1)
216-220: StandardizeRawHeaderJSON schema across ingestion sources for consistency and future-proofing.The three ingestion paths produce different JSON shapes:
celestia_appmarshals the protobuf header directly (flat fields), whilecelestia_nodeandbackfillpersist envelope-style JSON with top-levelheaderandcommitobjects. Although current consumers treatRawHeaderas opaque (json.RawMessage), the comments incelestia_node.goindicate future consumers may depend on structure. To prevent accidental coupling and simplify future migrations, adopt a single canonical shape across all sources.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/fetch/celestia_app.go` around lines 216 - 220, celestia_app currently marshals the protobuf header directly (json.Marshal(hdr)), producing a flat JSON shape that differs from the envelope-style used by celestia_node and backfill; change the serialization to emit the canonical envelope shape { "header": <hdr>, "commit": <commit> } to match the other ingestion paths. Concretely, construct an envelope object (e.g., a small struct or map with keys "header" set to hdr and "commit" set to the corresponding commit object or null/empty if not available) and json.Marshal that envelope instead of hdr so RawHeader uses the same schema as the other sources.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@go.mod`:
- Line 13: The go.mod lists github.com/gorilla/websocket v1.4.2 as a direct
dependency though no code imports it; remove the explicit direct entry or run
`go mod tidy` to let the tool convert it to an indirect dependency (or drop it
entirely if unused). Locate the github.com/gorilla/websocket line in go.mod and
either delete that line and run `go mod tidy` or run `go mod tidy` directly to
have the module moved to a // indirect entry or removed as appropriate.
---
Nitpick comments:
In `@pkg/backfill/db/source.go`:
- Around line 732-735: The current code swallows json.Marshal errors (raw, err
:= json.Marshal(obj)) and returns nil, which lets callers produce empty
RawHeader; change the function to propagate the serialization error instead of
returning nil silently—replace the bare "return nil" with returning the error
(e.g., return nil, err or wrap it with context) and update the function
signature/call sites if needed so callers handle the error and do not emit
headers with empty RawHeader.
In `@pkg/fetch/celestia_app.go`:
- Around line 216-220: celestia_app currently marshals the protobuf header
directly (json.Marshal(hdr)), producing a flat JSON shape that differs from the
envelope-style used by celestia_node and backfill; change the serialization to
emit the canonical envelope shape { "header": <hdr>, "commit": <commit> } to
match the other ingestion paths. Concretely, construct an envelope object (e.g.,
a small struct or map with keys "header" set to hdr and "commit" set to the
corresponding commit object or null/empty if not available) and json.Marshal
that envelope instead of hdr so RawHeader uses the same schema as the other
sources.
In `@pkg/fetch/celestia_node_test.go`:
- Around line 234-285: Convert the two separate tests into a single table-driven
test that iterates over cases (e.g., name, input []byte, expectedPresent
[]string, expectedAbsent []string, expectUnchanged bool) and uses TrimRawHeader
to produce trimmed output; for each case, unmarshal and assert presence/absence
of top-level keys and nested commit fields (block_id present, signatures absent)
for the valid header case and assert unchanged output for the invalid-JSON case;
update or remove TestTrimRawHeader and TestTrimRawHeaderInvalidJSON and keep a
reference to TrimRawHeader in the table-driven test so future scenarios can be
added by appending cases.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
go.modpkg/backfill/db/source.gopkg/fetch/celestia_app.gopkg/fetch/celestia_node.gopkg/fetch/celestia_node_test.go
- Move gorilla/websocket to indirect dep (not imported by committed code) - Propagate marshal error from buildMinimalRawHeader instead of returning nil - Standardize celestia_app RawHeader to canonical envelope shape matching celestia_node and backfill sources - Consolidate TrimRawHeader tests into table-driven structure Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Strip dah (73%), validator_set (16%), and commit.signatures (12%) from Celestia ExtendedHeader JSON before persisting to SQLite. No consumer reads these fields — ev-node only extracts the timestamp.
Reduces per-header storage from ~87KB to ~2KB (~98% reduction). Applied to all three ingestion paths: celestia-node JSON-RPC, celestia-app gRPC, and direct blockstore DB backfill.
Overview
Summary by CodeRabbit
Performance Improvements
Tests