aiperf: Weka synth buffer should reset context when truncation deletes emitted segments

## Summary

`truncate_synth_buf_at_block()` can delete previously emitted Weka synth-buffer segments while returning `None`, which `ConversationReconstructor.turn_delta()` interprets as no disturbance. When that happens, the next replay turn may be emitted as an append-only delta with `reset_context=False` even though the previously sent context was pulled back or fully cleared.

This corrupts Weka trace replay for turns where the hash-id prefix relationship is non-monotonic, especially when the LCP with the previous turn is zero or when truncation lands exactly on a segment boundary and drops later segments.

## Affected code

`utils/aiperf/src/aiperf/dataset/loader/weka_synth_buf.py`

The relevant flow is:

1. `ConversationReconstructor.advance_turn()` computes `lcp = longest_common_prefix(prev_hash_ids, curr_hash_ids)`.
2. It calls `truncate_synth_buf_at_block(..., target_blocks=lcp, ...)` and stores the return value in `_last_disturbance_at`.
3. `turn_delta()` treats a disturbance as reset-worthy only if `_last_disturbance_at is not None and _last_disturbance_at < _emitted_segment_count`.
4. If `_last_disturbance_at` is `None`, `turn_delta()` uses the strict append path and returns only `segments[_emitted_segment_count:]` with `reset_context=False`.

The problem is that `truncate_synth_buf_at_block()` currently returns `None` for cases that do delete prior context, for example:

- `target_blocks <= 0`: it clears `segments` and returns `None`.
- Boundary truncation where no surviving segment is modified, but segments after the boundary are deleted.
- Truncation that lands exactly at the start of a segment and deletes that segment and everything after it.

Those are real disturbances if any deleted segment was previously emitted.

## Why this is a bug

Weka reconstruction emits per-turn `raw_messages` as deltas unless `reset_context=True`. If the synth buffer removes previously emitted content, the downstream session must be reset and sent the full rebuilt context for the current turn.

Returning `None` from the truncation helper hides that removal. If `_emitted_segment_count` still points beyond the shortened buffer, the append path can emit an empty delta with `reset_context=False`. The worker then continues from stale conversation state instead of replaying the current Weka prompt. This silently changes the request stream and can preserve a larger or unrelated prior prompt across a context pull-back.

In cache/prefix-sensitive agentic replay, this is particularly bad because the trace is supposed to reproduce the recorded hash-id structure. A bad non-reset turn changes the actual payload while the trace metadata still reports the intended hash IDs.

## Expected behavior

`truncate_synth_buf_at_block()` should report the earliest deleted or modified segment index whenever truncation removes content that may have already been emitted.

Concretely:

- If `target_blocks <= 0` and the buffer was non-empty, clear it and return `0`.
- If a boundary cut deletes segments after the boundary and no in-place strip was already reported, return the first deleted segment index.
- If truncation lands at the start of segment `i` and deletes segment `i` onward, return `i`.
- Continue returning `None` only when no surviving or deleted segment represents a disturbance.

Then `turn_delta()` will take the reset path when deleted content intersects prior emitted content, emit the full rebuilt message list, and set `reset_context=True`.

## Suggested regression coverage

Add unit tests around `ConversationReconstructor` / `truncate_synth_buf_at_block()` for:

1. A turn with `lcp == 0` after at least one prior emitted turn. The next `turn_delta()` should return `reset_context=True` and non-empty rebuilt messages.
2. Boundary truncation that deletes one or more previously emitted segments without slicing the boundary segment. This should also force `reset_context=True`.
3. Truncation that deletes only segments that have not yet been emitted may remain append-only.

## Impact

This affects Weka trace replay correctness. It does not necessarily crash the run; it can silently send the wrong prompt/context for affected turns, which makes latency, token, and prefix-cache measurements untrustworthy for those traces.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aiperf: Weka synth buffer should reset context when truncation deletes emitted segments #1665

Summary

Affected code

Why this is a bug

Expected behavior

Suggested regression coverage

Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

aiperf: Weka synth buffer should reset context when truncation deletes emitted segments #1665

Description

Summary

Affected code

Why this is a bug

Expected behavior

Suggested regression coverage

Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions