Skip to content

feat: trim heavy fields from raw_header before storage#37

Merged
tac0turtle merged 2 commits intomainfrom
marko/reduce_space
Feb 25, 2026
Merged

feat: trim heavy fields from raw_header before storage#37
tac0turtle merged 2 commits intomainfrom
marko/reduce_space

Conversation

@tac0turtle
Copy link
Contributor

@tac0turtle tac0turtle commented Feb 25, 2026

Strip dah (73%), validator_set (16%), and commit.signatures (12%) from Celestia ExtendedHeader JSON before persisting to SQLite. No consumer reads these fields — ev-node only extracts the timestamp.

Reduces per-header storage from ~87KB to ~2KB (~98% reduction). Applied to all three ingestion paths: celestia-node JSON-RPC, celestia-app gRPC, and direct blockstore DB backfill.

Overview

Summary by CodeRabbit

  • Performance Improvements

    • Optimized header data storage by removing large, unused fields from block headers, reducing overall data sizes and improving operational efficiency.
    • Refined header marshaling to focus on essential header and commit information.
  • Tests

    • Added comprehensive tests to validate header optimization behavior and edge cases.

Strip dah (73%), validator_set (16%), and commit.signatures (12%) from
Celestia ExtendedHeader JSON before persisting to SQLite. No consumer
reads these fields — ev-node only extracts the timestamp.

Reduces per-header storage from ~87KB to ~2KB (~98% reduction).
Applied to all three ingestion paths: celestia-node JSON-RPC,
celestia-app gRPC, and direct blockstore DB backfill.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 25, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b222e37 and a4e252f.

📒 Files selected for processing (3)
  • pkg/backfill/db/source.go
  • pkg/fetch/celestia_app.go
  • pkg/fetch/celestia_node_test.go

📝 Walkthrough

Walkthrough

These changes optimize storage by replacing full block headers with minimal JSON representations containing only essential fields (height, time, data hash, and commit information), reducing per-header storage footprint across the backfill and fetch modules.

Changes

Cohort / File(s) Summary
Header Minimization in Backfill Module
pkg/backfill/db/source.go
Adds buildMinimalRawHeader helper function to construct lightweight JSON headers with height, time, data_hash, and commit data instead of storing full protobuf-derived blocks. Imports encoding/hex and encoding/json utilities.
Header Minimization in App Fetcher
pkg/fetch/celestia_app.go
Updates mapBlockResponse to wrap block header and synthesized commit object into an envelope structure before marshaling, reducing stored JSON size compared to marshaling full block data.
Header Trimming in Node Fetcher
pkg/fetch/celestia_node.go
Introduces TrimRawHeader function and heavyHeaderKeys list to strip large unused fields ("dah", "validator_set") from ExtendedHeader JSON. Updates mapHeader to store trimmed headers instead of full raw data, significantly reducing per-header storage.
Header Trimming Tests
pkg/fetch/celestia_node_test.go
Adds TestTrimRawHeader with test cases validating preservation of header and commit fields, removal of heavy fields, handling of invalid JSON, and output size reduction.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A minimalist's dream, headers lean and light,
Trim the fat, keep the heart, storage shines bright,
Three paths converge on the same worthy quest,
Essential fields only—we'll keep out the rest!
~The CodeRabbit's Refactor

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 27.27% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main objective of the changeset: trimming heavy fields from raw_header before storage to reduce storage size across three ingestion paths.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch marko/reduce_space

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
pkg/fetch/celestia_node_test.go (1)

234-285: Prefer table-driven structure for the new trim test scenarios.

These cases are good, but Lines 234-285 can be consolidated into a table-driven test to match repo test conventions and make future trim cases cheaper to add.

As per coding guidelines: **/*_test.go: "Use table-driven tests pattern for test cases".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fetch/celestia_node_test.go` around lines 234 - 285, Convert the two
separate tests into a single table-driven test that iterates over cases (e.g.,
name, input []byte, expectedPresent []string, expectedAbsent []string,
expectUnchanged bool) and uses TrimRawHeader to produce trimmed output; for each
case, unmarshal and assert presence/absence of top-level keys and nested commit
fields (block_id present, signatures absent) for the valid header case and
assert unchanged output for the invalid-JSON case; update or remove
TestTrimRawHeader and TestTrimRawHeaderInvalidJSON and keep a reference to
TrimRawHeader in the table-driven test so future scenarios can be added by
appending cases.
pkg/backfill/db/source.go (1)

732-735: Propagate marshal failure instead of returning nil silently.

Line 733 currently drops serialization errors and can emit headers with empty RawHeader without signal.

Proposed change
- RawHeader: buildMinimalRawHeader(height, block.Time, block.DataHash, meta.BlockHash),
+ rawHeader, err := buildMinimalRawHeader(height, block.Time, block.DataHash, meta.BlockHash)
+ if err != nil {
+   return nil, nil, fmt.Errorf("build minimal raw header at height %d: %w", height, err)
+ }
+ RawHeader: rawHeader,
-func buildMinimalRawHeader(height uint64, t time.Time, dataHash, blockHash []byte) []byte {
+func buildMinimalRawHeader(height uint64, t time.Time, dataHash, blockHash []byte) ([]byte, error) {
   obj := map[string]any{
     "header": map[string]any{
       "height":    fmt.Sprintf("%d", height),
       "time":      t.Format(time.RFC3339Nano),
       "data_hash": hex.EncodeToString(dataHash),
     },
     "commit": map[string]any{
       "height": fmt.Sprintf("%d", height),
       "block_id": map[string]any{
         "hash": hex.EncodeToString(blockHash),
       },
     },
   }
   raw, err := json.Marshal(obj)
   if err != nil {
-    return nil
+    return nil, err
   }
-  return raw
+  return raw, nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/backfill/db/source.go` around lines 732 - 735, The current code swallows
json.Marshal errors (raw, err := json.Marshal(obj)) and returns nil, which lets
callers produce empty RawHeader; change the function to propagate the
serialization error instead of returning nil silently—replace the bare "return
nil" with returning the error (e.g., return nil, err or wrap it with context)
and update the function signature/call sites if needed so callers handle the
error and do not emit headers with empty RawHeader.
pkg/fetch/celestia_app.go (1)

216-220: Standardize RawHeader JSON schema across ingestion sources for consistency and future-proofing.

The three ingestion paths produce different JSON shapes: celestia_app marshals the protobuf header directly (flat fields), while celestia_node and backfill persist envelope-style JSON with top-level header and commit objects. Although current consumers treat RawHeader as opaque (json.RawMessage), the comments in celestia_node.go indicate future consumers may depend on structure. To prevent accidental coupling and simplify future migrations, adopt a single canonical shape across all sources.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/fetch/celestia_app.go` around lines 216 - 220, celestia_app currently
marshals the protobuf header directly (json.Marshal(hdr)), producing a flat JSON
shape that differs from the envelope-style used by celestia_node and backfill;
change the serialization to emit the canonical envelope shape { "header": <hdr>,
"commit": <commit> } to match the other ingestion paths. Concretely, construct
an envelope object (e.g., a small struct or map with keys "header" set to hdr
and "commit" set to the corresponding commit object or null/empty if not
available) and json.Marshal that envelope instead of hdr so RawHeader uses the
same schema as the other sources.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@go.mod`:
- Line 13: The go.mod lists github.com/gorilla/websocket v1.4.2 as a direct
dependency though no code imports it; remove the explicit direct entry or run
`go mod tidy` to let the tool convert it to an indirect dependency (or drop it
entirely if unused). Locate the github.com/gorilla/websocket line in go.mod and
either delete that line and run `go mod tidy` or run `go mod tidy` directly to
have the module moved to a // indirect entry or removed as appropriate.

---

Nitpick comments:
In `@pkg/backfill/db/source.go`:
- Around line 732-735: The current code swallows json.Marshal errors (raw, err
:= json.Marshal(obj)) and returns nil, which lets callers produce empty
RawHeader; change the function to propagate the serialization error instead of
returning nil silently—replace the bare "return nil" with returning the error
(e.g., return nil, err or wrap it with context) and update the function
signature/call sites if needed so callers handle the error and do not emit
headers with empty RawHeader.

In `@pkg/fetch/celestia_app.go`:
- Around line 216-220: celestia_app currently marshals the protobuf header
directly (json.Marshal(hdr)), producing a flat JSON shape that differs from the
envelope-style used by celestia_node and backfill; change the serialization to
emit the canonical envelope shape { "header": <hdr>, "commit": <commit> } to
match the other ingestion paths. Concretely, construct an envelope object (e.g.,
a small struct or map with keys "header" set to hdr and "commit" set to the
corresponding commit object or null/empty if not available) and json.Marshal
that envelope instead of hdr so RawHeader uses the same schema as the other
sources.

In `@pkg/fetch/celestia_node_test.go`:
- Around line 234-285: Convert the two separate tests into a single table-driven
test that iterates over cases (e.g., name, input []byte, expectedPresent
[]string, expectedAbsent []string, expectUnchanged bool) and uses TrimRawHeader
to produce trimmed output; for each case, unmarshal and assert presence/absence
of top-level keys and nested commit fields (block_id present, signatures absent)
for the valid header case and assert unchanged output for the invalid-JSON case;
update or remove TestTrimRawHeader and TestTrimRawHeaderInvalidJSON and keep a
reference to TrimRawHeader in the table-driven test so future scenarios can be
added by appending cases.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 87217cd and b222e37.

📒 Files selected for processing (5)
  • go.mod
  • pkg/backfill/db/source.go
  • pkg/fetch/celestia_app.go
  • pkg/fetch/celestia_node.go
  • pkg/fetch/celestia_node_test.go

- Move gorilla/websocket to indirect dep (not imported by committed code)
- Propagate marshal error from buildMinimalRawHeader instead of returning nil
- Standardize celestia_app RawHeader to canonical envelope shape matching
  celestia_node and backfill sources
- Consolidate TrimRawHeader tests into table-driven structure

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@tac0turtle tac0turtle merged commit c8b5ebc into main Feb 25, 2026
3 of 4 checks passed
@tac0turtle tac0turtle deleted the marko/reduce_space branch February 25, 2026 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant