Summary
truncate_synth_buf_at_block() can delete previously emitted Weka synth-buffer segments while returning None, which ConversationReconstructor.turn_delta() interprets as no disturbance. When that happens, the next replay turn may be emitted as an append-only delta with reset_context=False even though the previously sent context was pulled back or fully cleared.
This corrupts Weka trace replay for turns where the hash-id prefix relationship is non-monotonic, especially when the LCP with the previous turn is zero or when truncation lands exactly on a segment boundary and drops later segments.
Affected code
utils/aiperf/src/aiperf/dataset/loader/weka_synth_buf.py
The relevant flow is:
ConversationReconstructor.advance_turn() computes lcp = longest_common_prefix(prev_hash_ids, curr_hash_ids).
- It calls
truncate_synth_buf_at_block(..., target_blocks=lcp, ...) and stores the return value in _last_disturbance_at.
turn_delta() treats a disturbance as reset-worthy only if _last_disturbance_at is not None and _last_disturbance_at < _emitted_segment_count.
- If
_last_disturbance_at is None, turn_delta() uses the strict append path and returns only segments[_emitted_segment_count:] with reset_context=False.
The problem is that truncate_synth_buf_at_block() currently returns None for cases that do delete prior context, for example:
target_blocks <= 0: it clears segments and returns None.
- Boundary truncation where no surviving segment is modified, but segments after the boundary are deleted.
- Truncation that lands exactly at the start of a segment and deletes that segment and everything after it.
Those are real disturbances if any deleted segment was previously emitted.
Why this is a bug
Weka reconstruction emits per-turn raw_messages as deltas unless reset_context=True. If the synth buffer removes previously emitted content, the downstream session must be reset and sent the full rebuilt context for the current turn.
Returning None from the truncation helper hides that removal. If _emitted_segment_count still points beyond the shortened buffer, the append path can emit an empty delta with reset_context=False. The worker then continues from stale conversation state instead of replaying the current Weka prompt. This silently changes the request stream and can preserve a larger or unrelated prior prompt across a context pull-back.
In cache/prefix-sensitive agentic replay, this is particularly bad because the trace is supposed to reproduce the recorded hash-id structure. A bad non-reset turn changes the actual payload while the trace metadata still reports the intended hash IDs.
Expected behavior
truncate_synth_buf_at_block() should report the earliest deleted or modified segment index whenever truncation removes content that may have already been emitted.
Concretely:
- If
target_blocks <= 0 and the buffer was non-empty, clear it and return 0.
- If a boundary cut deletes segments after the boundary and no in-place strip was already reported, return the first deleted segment index.
- If truncation lands at the start of segment
i and deletes segment i onward, return i.
- Continue returning
None only when no surviving or deleted segment represents a disturbance.
Then turn_delta() will take the reset path when deleted content intersects prior emitted content, emit the full rebuilt message list, and set reset_context=True.
Suggested regression coverage
Add unit tests around ConversationReconstructor / truncate_synth_buf_at_block() for:
- A turn with
lcp == 0 after at least one prior emitted turn. The next turn_delta() should return reset_context=True and non-empty rebuilt messages.
- Boundary truncation that deletes one or more previously emitted segments without slicing the boundary segment. This should also force
reset_context=True.
- Truncation that deletes only segments that have not yet been emitted may remain append-only.
Impact
This affects Weka trace replay correctness. It does not necessarily crash the run; it can silently send the wrong prompt/context for affected turns, which makes latency, token, and prefix-cache measurements untrustworthy for those traces.
Summary
truncate_synth_buf_at_block()can delete previously emitted Weka synth-buffer segments while returningNone, whichConversationReconstructor.turn_delta()interprets as no disturbance. When that happens, the next replay turn may be emitted as an append-only delta withreset_context=Falseeven though the previously sent context was pulled back or fully cleared.This corrupts Weka trace replay for turns where the hash-id prefix relationship is non-monotonic, especially when the LCP with the previous turn is zero or when truncation lands exactly on a segment boundary and drops later segments.
Affected code
utils/aiperf/src/aiperf/dataset/loader/weka_synth_buf.pyThe relevant flow is:
ConversationReconstructor.advance_turn()computeslcp = longest_common_prefix(prev_hash_ids, curr_hash_ids).truncate_synth_buf_at_block(..., target_blocks=lcp, ...)and stores the return value in_last_disturbance_at.turn_delta()treats a disturbance as reset-worthy only if_last_disturbance_at is not None and _last_disturbance_at < _emitted_segment_count._last_disturbance_atisNone,turn_delta()uses the strict append path and returns onlysegments[_emitted_segment_count:]withreset_context=False.The problem is that
truncate_synth_buf_at_block()currently returnsNonefor cases that do delete prior context, for example:target_blocks <= 0: it clearssegmentsand returnsNone.Those are real disturbances if any deleted segment was previously emitted.
Why this is a bug
Weka reconstruction emits per-turn
raw_messagesas deltas unlessreset_context=True. If the synth buffer removes previously emitted content, the downstream session must be reset and sent the full rebuilt context for the current turn.Returning
Nonefrom the truncation helper hides that removal. If_emitted_segment_countstill points beyond the shortened buffer, the append path can emit an empty delta withreset_context=False. The worker then continues from stale conversation state instead of replaying the current Weka prompt. This silently changes the request stream and can preserve a larger or unrelated prior prompt across a context pull-back.In cache/prefix-sensitive agentic replay, this is particularly bad because the trace is supposed to reproduce the recorded hash-id structure. A bad non-reset turn changes the actual payload while the trace metadata still reports the intended hash IDs.
Expected behavior
truncate_synth_buf_at_block()should report the earliest deleted or modified segment index whenever truncation removes content that may have already been emitted.Concretely:
target_blocks <= 0and the buffer was non-empty, clear it and return0.iand deletes segmentionward, returni.Noneonly when no surviving or deleted segment represents a disturbance.Then
turn_delta()will take the reset path when deleted content intersects prior emitted content, emit the full rebuilt message list, and setreset_context=True.Suggested regression coverage
Add unit tests around
ConversationReconstructor/truncate_synth_buf_at_block()for:lcp == 0after at least one prior emitted turn. The nextturn_delta()should returnreset_context=Trueand non-empty rebuilt messages.reset_context=True.Impact
This affects Weka trace replay correctness. It does not necessarily crash the run; it can silently send the wrong prompt/context for affected turns, which makes latency, token, and prefix-cache measurements untrustworthy for those traces.