test(jsonl): add jsonl_parser schema coverage and harden null usage#22
test(jsonl): add jsonl_parser schema coverage and harden null usage#22clean6378-max-it wants to merge 1 commit intomasterfrom
Conversation
Add tests/test_jsonl_parser.py covering _parse_tool_result dispatch, content helpers, user/assistant/system paths, file activity, parse_session integration, quick_session_info, and malformed entries. Treat non-dict message.usage in _process_assistant as empty to avoid crashes on null usage.
📝 WalkthroughWalkthroughA comprehensive test suite for the JSONL parser module is added with coverage of tool result parsing, content handling, message processing, file tracking, and session parsing. A defensive guard is added to ensure usage data is always a dict in assistant message processing. ChangesParser Test Coverage & Defensive Guard
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
utils/jsonl_parser.py (1)
140-142:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard
messagebefore dereferencing in both user/assistant processors.If an entry contains
"message": null, both processors call.get(...)onNoneand abortparse_sessionfor a recoverable malformed line. Apply the same dict-guard pattern here too.💡 Suggested fix
def _process_user(entry: dict, messages: list, metadata: dict): @@ - msg = entry.get("message", {}) + msg = entry.get("message", {}) + if not isinstance(msg, dict): + msg = {} content = msg.get("content", []) @@ def _process_assistant(entry: dict, messages: list, metadata: dict): @@ - msg = entry.get("message", {}) + msg = entry.get("message", {}) + if not isinstance(msg, dict): + msg = {} model = msg.get("model", "")Also applies to: 173-174
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@utils/jsonl_parser.py` around lines 140 - 142, The processors in utils/jsonl_parser.py currently assume entry["message"] is a dict and call entry.get("message", {}) without guarding against message being None; update both user and assistant processing blocks to first coerce message to a dict (e.g., msg = entry.get("message") or {} ) before calling msg.get("content", []) and passing to _extract_text, and apply the same pattern where similar code exists (the other processor and the parse_session usage) so a null "message" no longer raises when dereferenced.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/test_jsonl_parser.py`:
- Around line 718-730: Add two regression tests in tests/test_jsonl_parser.py
alongside test_missing_usage_dict_no_crash: one that passes a record with
"message": null and another that passes a record with a non-dict usage (e.g.,
"usage": "invalid" or a number) to ensure parse_session handles these malformed
shapes without crashing and still sets metadata.total_input_tokens to 0;
reference the existing test function name test_missing_usage_dict_no_crash and
the parser entrypoint parse_session to locate where to add the new tests and
mirror the try/finally cleanup pattern used there.
---
Outside diff comments:
In `@utils/jsonl_parser.py`:
- Around line 140-142: The processors in utils/jsonl_parser.py currently assume
entry["message"] is a dict and call entry.get("message", {}) without guarding
against message being None; update both user and assistant processing blocks to
first coerce message to a dict (e.g., msg = entry.get("message") or {} ) before
calling msg.get("content", []) and passing to _extract_text, and apply the same
pattern where similar code exists (the other processor and the parse_session
usage) so a null "message" no longer raises when dereferenced.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4b102107-1dd1-4277-936f-9231732d0eb4
📒 Files selected for processing (2)
tests/test_jsonl_parser.pyutils/jsonl_parser.py
| def test_missing_usage_dict_no_crash(self): | ||
| path = _write_jsonl([ | ||
| { | ||
| "type": "assistant", | ||
| "timestamp": "2026-01-01T00:00:00Z", | ||
| "message": {"model": "m", "content": [], "usage": None}, | ||
| }, | ||
| ]) | ||
| try: | ||
| s = parse_session(path) | ||
| assert s["metadata"]["total_input_tokens"] == 0 | ||
| finally: | ||
| os.unlink(path) |
There was a problem hiding this comment.
Add explicit message: null and non-dict usage regression tests.
Coverage currently locks in usage=None and missing message, but not "message": null or non-null non-dict usage values. Those malformed shapes are common in JSONL and should be pinned.
🧪 Suggested test additions
class TestParseSession:
@@
def test_missing_usage_dict_no_crash(self):
@@
s = parse_session(path)
assert s["metadata"]["total_input_tokens"] == 0
finally:
os.unlink(path)
+
+ `@pytest.mark.parametrize`("bad_usage", ["oops", [], 123])
+ def test_non_dict_usage_no_crash(self, bad_usage):
+ path = _write_jsonl([
+ {
+ "type": "assistant",
+ "timestamp": "2026-01-01T00:00:00Z",
+ "message": {"model": "m", "content": [], "usage": bad_usage},
+ },
+ ])
+ try:
+ s = parse_session(path)
+ assert s["metadata"]["total_input_tokens"] == 0
+ finally:
+ os.unlink(path)
@@
class TestMalformedPartialEntries:
@@
def test_assistant_missing_message_key(self):
@@
assert s["messages"][0]["role"] == "assistant"
finally:
os.unlink(path)
+
+ def test_assistant_message_null_no_crash(self):
+ path = _write_jsonl([
+ {"type": "assistant", "timestamp": "2026-01-01T00:00:00Z", "message": None},
+ ])
+ try:
+ s = parse_session(path)
+ assert len(s["messages"]) == 1
+ assert s["messages"][0]["role"] == "assistant"
+ finally:
+ os.unlink(path)Also applies to: 797-807
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_jsonl_parser.py` around lines 718 - 730, Add two regression tests
in tests/test_jsonl_parser.py alongside test_missing_usage_dict_no_crash: one
that passes a record with "message": null and another that passes a record with
a non-dict usage (e.g., "usage": "invalid" or a number) to ensure parse_session
handles these malformed shapes without crashing and still sets
metadata.total_input_tokens to 0; reference the existing test function name
test_missing_usage_dict_no_crash and the parser entrypoint parse_session to
locate where to add the new tests and mirror the try/finally cleanup pattern
used there.
Closes #21
Summary
Adds comprehensive pytest coverage for
utils/jsonl_parser.py(CCC1) and hardens assistant parsing whenmessage.usageis JSONnullor otherwise not a plain object.Motivation
jsonl_parser.pyis the structural core for untrusted, schema-evolving Claude Code JSONL. Direct tests lock in_parse_tool_resultdispatch, helper behavior, and integration paths so silent regressions from schema drift are caught early.Changes
tests/test_jsonl_parser.pycovering:_parse_tool_resultfor all major result shapes and fallbacks_normalize_content,_extract_text,_extract_images_infer_title,_strip_system_tags_process_user,_process_assistant,_process_system,_track_file_activityparse_sessionintegration (empty files, unknown types, sidechains, snapshots, wall time, malformed lines)quick_session_infofor small vs large sessions_process_assistanttreats non-dictusageas{}sousage: nulldoes not crash.Testing