Skip to content

Comments

Add Hi Goodtimes statistical filter 1 check#2732

Open
subagonsouth wants to merge 5 commits intoIMAP-Science-Operations-Center:devfrom
subagonsouth:2724-hi-goodtimes---statistical-filter-1
Open

Add Hi Goodtimes statistical filter 1 check#2732
subagonsouth wants to merge 5 commits intoIMAP-Science-Operations-Center:devfrom
subagonsouth:2724-hi-goodtimes---statistical-filter-1

Conversation

@subagonsouth
Copy link
Contributor

Change Summary

Summary

Implements Statistical Filter 1 for Hi Goodtimes processing, which detects isotropic count rate increases per Algorithm Document Section 2.3.2.3.

New Functions

  • _compute_qualified_counts_per_sweep(): Computes qualified calibration product counts per 8-spin interval using vectorized np.add.at() with the (esa_sweep, esa_step) multi-index
  • _build_per_sweep_datasets(): Builds per-sweep datasets with qualified counts for multiple Pointings
  • _compute_median_and_sigma_per_esa(): Computes median and sigma per ESA step using xr.concat and xr.median to handle datasets with potentially different ESA step coordinates
  • _identify_cull_pattern(): Identifies 2D cull patterns using scipy.ndimage.convolve1d for efficient pattern detection
  • mark_statistical_filter_1(): Main entry point that applies the three-pass filter algorithm

Algorithm

The filter applies three passes using 2D convolution:

  1. Consecutive runs: Marks intervals where counts exceed median + 1.8σ for 3+ consecutive intervals AND the adjacent ESA step also exceeds threshold at the same time
  2. Isolated intervals: Marks good intervals sandwiched between two bad intervals (orphan detection)
  3. Extreme outliers: Marks remaining intervals exceeding median + 5σ

Implementation Highlights

  • Uses idiomatic xarray operations (xr.concat, xr.median, broadcasting) instead of loops
  • Leverages scipy.ndimage.convolve1d for efficient pattern matching along both time (esa_sweep) and energy (esa_step) dimensions
  • Handles datasets with different ESA step coordinates by using xr.concat which aligns and fills missing values with NaN
  • Returns xr.DataArray objects with proper coordinates for seamless integration

Test Coverage

Adds comprehensive tests for all new functions including edge cases for empty data, NaN handling, and pattern detection verification.

File changes

  • imap_processing/hi/hi_goodtimes.py
  • imap_processing/tests/hi/test_hi_goodtimes.py

Closes: #2724

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements Statistical Filter 1 for IMAP-Hi “goodtimes” processing to detect isotropic qualified-count excursions per ESA sweep/step, and adds unit/integration tests for the new filter and helpers.

Changes:

  • Added Statistical Filter 1 implementation (mark_statistical_filter_1) plus helper functions to compute per-(esa_sweep, esa_step) qualified counts, aggregate across pointings, and detect cull patterns via 2D convolutions.
  • Added tests covering helper behavior and end-to-end filter behavior (including extreme outliers and error cases).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
imap_processing/hi/hi_goodtimes.py Adds Statistical Filter 1 logic, including per-sweep qualified count computation and convolution-based cull mask detection.
imap_processing/tests/hi/test_hi_goodtimes.py Adds unit tests for new helpers and integration tests for Statistical Filter 1 behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1147 to 1149
# Select only the esa_step values present in the data
ds_reshaped = ds_reshaped.sel(esa_step=ds_reshaped.coords["esa_step"])

Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment and code here don’t match: the sel(esa_step=ds_reshaped.coords['esa_step']) call is a no-op because it selects the coordinate values already present. If the intent is to restrict qualified_count/dataset to only ESA steps present in the original data, please select from the original unique esa_step values (or remove this block to avoid confusion).

Suggested change
# Select only the esa_step values present in the data
ds_reshaped = ds_reshaped.sel(esa_step=ds_reshaped.coords["esa_step"])

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@laspsandoval laspsandoval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through very carefully and everything looks good. Just some comments/questions about constant values. I'm approving since there isn't anything significant that needs to be changed. Nice work!


# Count qualified events per (esa_sweep, esa_step) using 2D array
n_sweeps = int(esa_sweep.max()) + 1
n_esa_steps = int(esa_step.max()) + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does esa_sweep start at zero each time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this comment. It made me realize that I have a bug. The esa_sweep dimension is somewhat arbitrary. It just counts how many times the ESA steps through the full range of energy steps (nominally 9-steps). But the bug is that the definition of which energy esa_step N corresponds to can change from one pointing to the next. So, I should be using esa_energy_step here in order to make sure that the correct energy steps get aligned.

# Count qualified events per (esa_sweep, esa_step) using 2D array
n_sweeps = int(esa_sweep.max()) + 1
n_esa_steps = int(esa_step.max()) + 1
counts_2d = np.zeros((n_sweeps, n_esa_steps), dtype=np.float64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# For every pair of indices (sweep_i, step_i), add 1 to that cell.

extreme_threshold_sigma: float = 5.0,
min_consecutive_intervals: int = 3,
cull_code: int = CullCode.LOOSE,
min_pointings: int = 4,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why you chose 4?

def _identify_cull_pattern(
current_counts: xr.DataArray,
median_per_esa: xr.DataArray,
sigma_per_esa: xr.DataArray,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these defaults?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. These are from the algorithm document, but should be added to the Hi Constants.

# Dilate the consecutive detection to mark all positions in runs
# convolve1d centers the kernel, so we dilate to capture run edges
run_kernel = np.ones(min_consecutive)
run_positions = convolve1d(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice way of doing this.

@subagonsouth subagonsouth force-pushed the 2724-hi-goodtimes---statistical-filter-1 branch from d3cfcab to 0035d4f Compare February 23, 2026 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Hi Goodtimes - statistical filter 1

2 participants