KLAUS-378: latency_to_first_prediction uses start instead of adjusted_start

<p data-local-id="f2e959718931" data-prosemirror-content-type="node" data-prosemirror-node-name="paragraph" data-prosemirror-node-block="true" data-pm-slice="1 1 []">Many of us that regularly use the Nextflow pipeline to generate behavior data have noticed that the data in the latency_to_first_prediction and latency_to_last_prediction columns don’t make a lot of sense because they report behaviors in frame numbers greater than the number of frames in the video. Also, the numbers don’t agree with the values in the merged_bouts tables produced by the NF pipeline. </p><p data-local-id="da29e73e1022" data-prosemirror-content-type="node" data-prosemirror-node-name="paragraph" data-prosemirror-node-block="true">Michelle asked Claude to analyze the code in the repository. It discovered a bug in the code and then confirmed the error with some example data I provided it with.</p><h2 data-local-id="fdef87aa8265" data-prosemirror-content-type="node" data-prosemirror-node-name="heading" data-prosemirror-node-block="true">The Bug: <span class="code" spellcheck="false" data-prosemirror-content-type="mark" data-prosemirror-mark-name="code">latency_to_first_prediction</span> Uses <span class="code" spellcheck="false" data-prosemirror-content-type="mark" data-prosemirror-mark-name="code">start</span> Instead of <span class="code" spellcheck="false" data-prosemirror-content-type="mark" data-prosemirror-mark-name="code">adjusted_start</span></h2><h3 data-local-id="74531f27c93a" data-prosemirror-content-type="node" data-prosemirror-node-name="heading" data-prosemirror-node-block="true">What the column means (per the README)</h3><blockquote data-prosemirror-content-type="node" data-prosemirror-node-name="blockquote" data-prosemirror-node-block="true"><p data-local-id="06a3c875003d" data-prosemirror-content-type="node" data-prosemirror-node-name="paragraph" data-prosemirror-node-block="true"><span class="code" spellcheck="false" data-prosemirror-content-type="mark" data-prosemirror-mark-name="code">latency_to_first_prediction</span>: Frame number of first behavior prediction <strong data-prosemirror-content-type="mark" data-prosemirror-mark-name="strong">in the time bin. Frame is relative to the experiment start, not the time bin.</strong></p></blockquote><p data-local-id="6328e991986c" data-prosemirror-content-type="node" data-prosemirror-node-name="paragraph" data-prosemirror-node-block="true">So the intent is: find the frame number (relative to the experiment start) of the first behavior event within that bin.</p><hr data-prosemirror-content-type="node" data-prosemirror-node-name="rule" data-prosemirror-node-block="true"><h3 data-local-id="716e6394a007" data-prosemirror-content-type="node" data-prosemirror-node-name="heading" data-prosemirror-node-block="true">What the code actually does</h3><p data-local-id="4acc6ee07d23" data-prosemirror-content-type="node" data-prosemirror-node-name="paragraph" data-prosemirror-node-block="true">At line <strong data-prosemirror-content-type="mark" data-prosemirror-mark-name="strong">1083</strong>:</p><p data-local-id="e75d7d1f8a13" data-prosemirror-content-type="node" data-prosemirror-node-name="paragraph" data-prosemirror-node-block="true">python</p><pre class="code-block" style="--lineNumberGutterWidth:1ch;" data-language="" data-prosemirror-content-type="node" data-prosemirror-node-name="codeBlock" data-prosemirror-node-block="true"><div class="code-block--start" contenteditable="false"></div><div class="code-block-content-wrapper"><div style="background-color: var(--ds-background-neutral, #0515240F); box-sizing: content-box; flex-shrink: 0; font-size: 0.875rem; padding: var(--ds-space-100, 8px); position: relative; width: var(--lineNumberGutterWidth, 2rem);" contenteditable="false"><div class="code-block-gutter-pseudo-element" style="color: var(--ds-text-subtlest, #6B6E76); font-family: var(--ds-font-family-code, &quot;Atlassian Mono&quot;, ui-monospace, Menlo, &quot;Segoe UI Mono&quot;, &quot;Ubuntu Mono&quot;, monospace); text-align: right; white-space: pre-wrap;" data-label="1"></div></div><div class="code-content"><code data-language="" spellcheck="false" data-testid="code-block--code" aria-label="">results["latency_to_first_prediction"] = behavior_bins["start"].min()</code></div></div><div class="code-block--end" contenteditable="false"></div></pre><p data-local-id="e660dbd9a56d" data-prosemirror-content-type="node" data-prosemirror-node-name="paragraph" data-prosemirror-node-block="true">The key distinction is between two columns that live on each bout:</p><div class="tableView-content-wrap" data-prosemirror-initial-todom-render="true" data-prosemirror-content-type="node" data-prosemirror-node-name="table" data-prosemirror-node-block="true"><div data-testid="table-alignment-container" style="display: flex;justify-content: center"><div class="pm-table-resizer-container" style="--ak-editor-table-gutter-padding: calc(var(--ak-editor--large-gutter-padding) * 2); --ak-editor-table-width: calc(100cqw - calc(var(--ak-editor--large-gutter-padding) * 2)); width: var(--ak-editor-table-width);"><div class="resizer-item display-handle" style="--ak-editor-table-max-width: 1800px; --ak-editor-table-min-width: 97px; box-sizing: border-box; max-width: 100%; min-width: var(--ak-editor-table-min-width); position: relative; user-select: auto; width: auto;"><span class="resizer-hover-zone"><div class="pm-table-container" data-number-column="false" data-layout="center" data-testid="table-container"><div class="pm-table-sticky-sentinel-top" data-testid="sticky-sentinel-top"></div><div class="pm-table-row-controls-wrapper"><div></div></div><div class="pm-table-wrapper">
Column | Meaning
-- | --
start | Frame number within the video file where the bout begins
adjusted_start | Frame number relative to the experiment start (= time_to_frame(video_timestamp, experiment_start) + start)

</div><div class="pm-table-sticky-sentinel-bottom" data-testid="sticky-sentinel-bottom"></div></div></span></div></div></div></div><p data-local-id="ace0c3be7178" data-prosemirror-content-type="node" data-prosemirror-node-name="paragraph" data-prosemirror-node-block="true"><span class="code" spellcheck="false" data-prosemirror-content-type="mark" data-prosemirror-mark-name="code">adjusted_start</span> is what correctly represents the frame offset from experiment start. <span class="code" spellcheck="false" data-prosemirror-content-type="mark" data-prosemirror-mark-name="code">start</span> only resets for each individual video file.</p><hr data-prosemirror-content-type="node" data-prosemirror-node-name="rule" data-prosemirror-node-block="true"><h3 data-local-id="4cf9b49e740d" data-prosemirror-content-type="node" data-prosemirror-node-name="heading" data-prosemirror-node-block="true">Why it works for the 0–5 minute bin but fails for later bins</h3><p data-local-id="776e3cd3ffc2" data-prosemirror-content-type="node" data-prosemirror-node-name="paragraph" data-prosemirror-node-block="true">For the <strong data-prosemirror-content-type="mark" data-prosemirror-mark-name="strong">first bin</strong> (0–5 min), the first video starts at time zero, so <span class="code" spellcheck="false" data-prosemirror-content-type="mark" data-prosemirror-mark-name="code">time_to_frame(video_timestamp, experiment_start)</span> = 0. This means <span class="code" spellcheck="false" data-prosemirror-content-type="mark" data-prosemirror-mark-name="code">adjusted_start == start</span>, so the result is accidentally correct.</p><p data-local-id="ee3b777687d9" data-prosemirror-content-type="node" data-prosemirror-node-name="paragraph" data-prosemirror-node-block="true">For <strong data-prosemirror-content-type="mark" data-prosemirror-mark-name="strong">later bins</strong> (5–20 min and 20–55 min), bouts come from video segments that started later in the experiment. Their <span class="code" spellcheck="false" data-prosemirror-content-type="mark" data-prosemirror-mark-name="code">adjusted_start</span> correctly reflects the offset from experiment start, but <span class="code" spellcheck="false" data-prosemirror-content-type="mark" data-prosemirror-mark-name="code">start</span> only holds the within-video frame number — which could be any small number relative to that video's beginning. So <span class="code" spellcheck="false" data-prosemirror-content-type="mark" data-prosemirror-mark-name="code">behavior_bins["start"].min()</span> returns a frame number that is meaningless in the context of the full experiment timeline.</p><p data-local-id="fa5e603f181b" data-prosemirror-content-type="node" data-prosemirror-node-name="paragraph" data-prosemirror-node-block="true">There's a second compounding issue: when a bout is <strong data-prosemirror-content-type="mark" data-prosemirror-mark-name="strong">split at a bin boundary</strong> (lines 1011–1019), the <span class="code" spellcheck="false" data-prosemirror-content-type="mark" data-prosemirror-mark-name="code">second_half["start"]</span> is updated to:</p><p data-local-id="10508c27f093" data-prosemirror-content-type="node" data-prosemirror-node-name="paragraph" data-prosemirror-node-block="true">python</p><pre class="code-block" style="--lineNumberGutterWidth:1ch;" data-language="" data-prosemirror-content-type="node" data-prosemirror-node-name="codeBlock" data-prosemirror-node-block="true"><div class="code-block--start" contenteditable="false"></div><div class="code-block-content-wrapper"><div style="background-color: var(--ds-background-neutral, #0515240F); box-sizing: content-box; flex-shrink: 0; font-size: 0.875rem; padding: var(--ds-space-100, 8px); position: relative; width: var(--lineNumberGutterWidth, 2rem);" contenteditable="false"><div class="code-block-gutter-pseudo-element" style="color: var(--ds-text-subtlest, #6B6E76); font-family: var(--ds-font-family-code, &quot;Atlassian Mono&quot;, ui-monospace, Menlo, &quot;Segoe UI Mono&quot;, &quot;Ubuntu Mono&quot;, monospace); text-align: right; white-space: pre-wrap;" data-label="1"></div></div><div class="code-content"><code data-language="" spellcheck="false" data-testid="code-block--code" aria-label="">second_half["start"] = second_half["start"] + cur_cut - second_half["adjusted_start"]</code></div></div><div class="code-block--end" contenteditable="false"></div></pre><p data-local-id="894da970b1c5" data-prosemirror-content-type="node" data-prosemirror-node-name="paragraph" data-prosemirror-node-block="true">This calculation adjusts <span class="code" spellcheck="false" data-prosemirror-content-type="mark" data-prosemirror-mark-name="code">start</span> by the cut offset, but <span class="code" spellcheck="false" data-prosemirror-content-type="mark" data-prosemirror-mark-name="code">start</span> is a within-video frame number while <span class="code" spellcheck="false" data-prosemirror-content-type="mark" data-prosemirror-mark-name="code">cur_cut</span> is an experiment-relative frame number — so the arithmetic mixes two different reference frames, producing a corrupted <span class="code" spellcheck="false" data-prosemirror-content-type="mark" data-prosemirror-mark-name="code">start</span> value on split bouts.</p><hr data-prosemirror-content-type="node" data-prosemirror-node-name="rule" data-prosemirror-node-block="true"><h3 data-local-id="e04eb97ae13e" data-prosemirror-content-type="node" data-prosemirror-node-name="heading" data-prosemirror-node-block="true">The fix</h3><p data-local-id="ad20d157ed79" data-prosemirror-content-type="node" data-prosemirror-node-name="paragraph" data-prosemirror-node-block="true">Line 1083 should use <span class="code" spellcheck="false" data-prosemirror-content-type="mark" data-prosemirror-mark-name="code">adjusted_start</span> instead of <span class="code" spellcheck="false" data-prosemirror-content-type="mark" data-prosemirror-mark-name="code">start</span>:</p><p data-local-id="ca5613fb18a0" data-prosemirror-content-type="node" data-prosemirror-node-name="paragraph" data-prosemirror-node-block="true">python</p><pre class="code-block" style="--lineNumberGutterWidth:1ch;" data-language="" data-prosemirror-content-type="node" data-prosemirror-node-name="codeBlock" data-prosemirror-node-block="true"><div class="code-block--start" contenteditable="false"></div><div class="code-block-content-wrapper"><div style="background-color: var(--ds-background-neutral, #0515240F); box-sizing: content-box; flex-shrink: 0; font-size: 0.875rem; padding: var(--ds-space-100, 8px); position: relative; width: var(--lineNumberGutterWidth, 2rem);" contenteditable="false"><div class="code-block-gutter-pseudo-element" style="color: var(--ds-text-subtlest, #6B6E76); font-family: var(--ds-font-family-code, &quot;Atlassian Mono&quot;, ui-monospace, Menlo, &quot;Segoe UI Mono&quot;, &quot;Ubuntu Mono&quot;, monospace); text-align: right; white-space: pre-wrap;" data-label="1
2
3
4
5"></div></div><div class="code-content"><code data-language="" spellcheck="false" data-testid="code-block--code" aria-label=""># Current (wrong):
results["latency_to_first_prediction"] = behavior_bins["start"].min()

# Fixed:
results["latency_to_first_prediction"] = behavior_bins["adjusted_start"].min()</code></div></div><div class="code-block--end" contenteditable="false"></div></pre><p data-local-id="7f36f51a2c81" data-prosemirror-content-type="node" data-prosemirror-node-name="paragraph" data-prosemirror-node-block="true">Similarly, line 1084–1086 for <span class="code" spellcheck="false" data-prosemirror-content-type="mark" data-prosemirror-mark-name="code">latency_to_last_prediction</span> should use <span class="code" spellcheck="false" data-prosemirror-content-type="mark" data-prosemirror-mark-name="code">adjusted_start + duration</span> (which is just <span class="code" spellcheck="false" data-prosemirror-content-type="mark" data-prosemirror-mark-name="code">adjusted_end</span>):</p><p data-local-id="080a19ec3aef" data-prosemirror-content-type="node" data-prosemirror-node-name="paragraph" data-prosemirror-node-block="true">python</p><pre class="code-block" style="--lineNumberGutterWidth:1ch;" data-language="" data-prosemirror-content-type="node" data-prosemirror-node-name="codeBlock" data-prosemirror-node-block="true"><div class="code-block--start" contenteditable="false"></div><div class="code-block-content-wrapper"><div style="background-color: var(--ds-background-neutral, #0515240F); box-sizing: content-box; flex-shrink: 0; font-size: 0.875rem; padding: var(--ds-space-100, 8px); position: relative; width: var(--lineNumberGutterWidth, 2rem);" contenteditable="false"><div class="code-block-gutter-pseudo-element" style="color: var(--ds-text-subtlest, #6B6E76); font-family: var(--ds-font-family-code, &quot;Atlassian Mono&quot;, ui-monospace, Menlo, &quot;Segoe UI Mono&quot;, &quot;Ubuntu Mono&quot;, monospace); text-align: right; white-space: pre-wrap;" data-label="1
2
3
4
5
6
7"></div></div><div class="code-content"><code data-language="" spellcheck="false" data-testid="code-block--code" aria-label=""># Current (wrong):
results["latency_to_last_prediction"] = (
    behavior_bins["start"] + behavior_bins["duration"]
).max()

# Fixed:
results["latency_to_last_prediction"] = behavior_bins["adjusted_end"].max()</code></div></div><div class="code-block--end" contenteditable="false"></div></pre><p data-local-id="019aaa83ff8f" data-prosemirror-content-type="node" data-prosemirror-node-name="paragraph" data-prosemirror-node-block="true">This ensures both latency values are always expressed in experiment-relative frames, matching what the documentation describes and what you actually want. </p>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KLAUS-378: latency_to_first_prediction uses start instead of adjusted_start #57

The Bug: latency_to_first_prediction Uses start Instead of adjusted_start

What the column means (per the README)

What the code actually does

Why it works for the 0–5 minute bin but fails for later bins

The fix

Fixed:

Fixed:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

KLAUS-378: latency_to_first_prediction uses start instead of adjusted_start #57

Description

The Bug: latency_to_first_prediction Uses start Instead of adjusted_start

What the column means (per the README)

What the code actually does

Why it works for the 0–5 minute bin but fails for later bins

The fix

Fixed:

Fixed:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions