Skip to content

fix: Store raw entries during local JSONL ingestion#105

Merged
evansenter merged 4 commits into
mainfrom
fix/raw-entries-local-ingestion
Feb 7, 2026
Merged

fix: Store raw entries during local JSONL ingestion#105
evansenter merged 4 commits into
mainfrom
fix/raw-entries-local-ingestion

Conversation

@evansenter

Copy link
Copy Markdown
Owner

Summary

  • Local ingestion (ingest_file) now stores raw JSONL lines in raw_entries table
  • Matches behavior of upload_entries (client push path) which already preserved raw entries
  • Uses the original JSONL line directly (avoids redundant json.dumps round-trip)
  • Deduplication via INSERT OR IGNORE on (session_id, timestamp) prevents conflicts with client-pushed entries

Context

Server-side local JSONL ingestion parsed events but didn't preserve raw entries. Client pushes did. This meant server-local session data couldn't be re-parsed from the DB if the parser improved — you'd need the original JSONL files on disk.

Test plan

  • make check — 410 passed, 10 skipped
  • After merge + restart, force re-ingest (ingest_logs(force=True)) and verify raw_entries count increases for local sessions

🤖 Generated with Claude Code

Local ingestion (ingest_logs/ingest_file) now stores raw JSONL entries
in the raw_entries table, matching the behavior of upload_entries for
client pushes. This ensures server-side session data can be re-parsed
if the parser improves, closing the gap where only client-pushed data
had raw entries preserved.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude

claude Bot commented Feb 7, 2026

Copy link
Copy Markdown

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR adds raw JSONL entry storage to the local ingest_file path, closing a gap where only the upload_entries (client push) path preserved raw entries in the raw_entries table. The implementation is clean and correctly mirrors the existing upload path.

Issues Found

Critical

None

Important

  • tests/test_ingest.py:521-533 - The existing test_ingest_file test does not assert on the new raw_entries_added return value. The upload path has test coverage for this (test_server.py:449-450), but local ingestion does not. A test should verify that result["raw_entries_added"] equals the expected count (3, matching the 3 entries with sessionId and timestamp in the fixture) and that the entries are actually queryable from storage.

Suggestions

  • ingest.py:608 - Minor: the line variable at this point has already been .strip()'d (line 595), so any trailing newline is removed. This is correct behavior, but worth noting that the stored entry_json will not have a trailing newline while upload_entries stores json.dumps(raw) output. If the same session is ingested via both paths, the UNIQUE(session_id, timestamp, entry_json) constraint would treat them as distinct entries due to potential formatting differences (key ordering, whitespace). This is an existing design characteristic of the schema, not introduced by this PR, but could be worth documenting or addressing in a follow-up if dedup across ingestion paths matters.

Verdict

REQUEST_CHANGES - The implementation is correct and well-structured, but should include test coverage for the new raw_entries_added behavior in local ingestion to match the existing test coverage for the upload path.


Automated review by Claude Code

- test_ingest_file: assert raw_entries_added count
- test_ingest_file_stores_raw_entries: verify DB contents and valid JSON
- test_ingest_file_raw_entries_dedup: verify INSERT OR IGNORE dedup
- test_ingest_logs: assert raw_entries_added in return dict
- test_server.test_ingest_logs: assert raw_entries_added key exists

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude

claude Bot commented Feb 7, 2026

Copy link
Copy Markdown

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR adds raw JSONL entry storage to the local ingest_file path, closing a gap where only upload_entries (client push) preserved raw entries. The implementation is clean, well-tested, and correctly mirrors the upload path.

Issues Found

Critical

None

Important

None

Suggestions

  • ingest.py:581 - The skipped return dict ({"entries_processed": 0, "events_added": 0, "skipped": True}) does not include the new raw_entries_added key, while the non-skipped return dict at line 631 does. The ingest_logs caller handles this with .get("raw_entries_added", 0) and only reaches that code in the non-skipped branch, so this won't cause a runtime error today. However, for return-type consistency (and to avoid a KeyError if any future caller accesses result["raw_entries_added"] on a skipped result), consider adding "raw_entries_added": 0 to the skipped return dict as well.
  • tests/test_ingest.py:554 - import json inside test_ingest_file_stores_raw_entries is redundant — json is already imported at the top of the file (line 3). Should use the module-level import.

Previously Addressed (Filtered)

  • tests/test_ingest.py:521-533 - Missing assertion on raw_entries_added in test_ingest_file (Implemented: added in second commit along with three dedicated test methods)

1 item from prior feedback rounds was not re-raised.

Verdict

REQUEST_CHANGES - Two minor suggestions: inconsistent return dict shape for the skipped path, and a redundant inline import in tests.


Automated review by Claude Code

- Add raw_entries_added: 0 to skipped return dict for consistency
- Remove redundant import json inside test function

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@evansenter

Copy link
Copy Markdown
Owner Author

Feedback Addressed

Implemented

  • [Suggestion] ingest.py:581 — Added raw_entries_added: 0 to skipped return dict for consistency
  • [Suggestion] tests/test_ingest.py:554 — Removed redundant inline import json, using module-level import

Skipped

  • [Suggestion] ingest.py:608line vs json.dumps(raw) formatting difference: the UNIQUE constraint is on (session_id, timestamp), not entry_json, so dedup works correctly regardless of formatting differences between ingestion paths

@claude

claude Bot commented Feb 7, 2026

Copy link
Copy Markdown

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR closes a gap where local JSONL ingestion (ingest_file) did not preserve raw entries in the raw_entries table, while the client push path (upload_entries) already did. The implementation correctly stores raw JSONL lines using INSERT OR IGNORE for deduplication, adds raw_entries_added to all return paths, and includes thorough test coverage (DB content verification, deduplication, and return value assertions).

Issues Found

Critical

None

Important

None

Suggestions

None

Previously Addressed (Filtered)

  • ingest.py:581 — Added raw_entries_added: 0 to skipped return dict for consistency (Implemented)
  • tests/test_ingest.py:554 — Removed redundant inline import json, using module-level import (Implemented)
  • tests/test_ingest.py:521-533 — Missing assertion on raw_entries_added in test_ingest_file (Implemented in second commit)
  • ingest.py:608line vs json.dumps(raw) formatting difference across ingestion paths (Skipped: UNIQUE constraint is on (session_id, timestamp), not entry_json)

4 items from prior feedback rounds were not re-raised.

Verdict

APPROVE - All prior feedback has been addressed. The implementation is clean, correctly mirrors the upload path, and has solid test coverage including deduplication verification.


Automated review by Claude Code

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude

claude Bot commented Feb 7, 2026

Copy link
Copy Markdown

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR closes a gap where local JSONL ingestion (ingest_file) did not preserve raw entries in the raw_entries table, while the client push path (upload_entries) already did. The implementation stores the original JSONL line (avoiding a redundant json.dumps round-trip), uses INSERT OR IGNORE for deduplication, adds raw_entries_added to all return paths consistently, and includes thorough test coverage.

Issues Found

Critical

None

Important

None

Suggestions

None

Previously Addressed (Filtered)

  • ingest.py:581 — Added raw_entries_added: 0 to skipped return dict for consistency (Implemented)
  • tests/test_ingest.py:554 — Removed redundant inline import json, using module-level import (Implemented)
  • tests/test_ingest.py:521-533 — Missing assertion on raw_entries_added in test_ingest_file (Implemented in second commit)
  • ingest.py:608line vs json.dumps(raw) formatting difference across ingestion paths (Skipped: UNIQUE constraint is on (session_id, timestamp), not entry_json)

4 items from prior feedback rounds were not re-raised.

Verdict

APPROVE - All prior feedback has been addressed. The implementation is clean, correctly mirrors the upload path behavior, and has solid test coverage including DB content verification and deduplication testing.


Automated review by Claude Code

@evansenter evansenter merged commit 574534d into main Feb 7, 2026
3 checks passed
@evansenter evansenter deleted the fix/raw-entries-local-ingestion branch February 7, 2026 04:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant