feat(tests): add HETA 1.2.0 parquet size checks and GeoJSON parity validation by ari-nz · Pull Request #640 · aignostics/python-sdk

ari-nz · 2026-05-12T08:36:51Z

Summary

Adds validation for the 3 new parquet outputs introduced in HETA 1.2.0 (tissue_qc, tissue_segmentation, cell_classification). cell_detection parquet outputs are intentionally excluded as they are being removed from the pipeline.

Updates SPOT_0_EXPECTED_RESULT_FILES and SPOT_1_EXPECTED_RESULT_FILES to include the 3 new parquet entries (12 files total)
Updates cli_test.py and gui_test.py to assert 12 result files instead of 9
Adds parquet↔GeoJSON parity checks: len(pd.read_parquet(...)) must equal len(geojson["features"]) for each paired output

Test plan

Long-running e2e tests download all 12 output files and assert sizes within ±10%
Parity check validates row counts match GeoJSON feature counts for all 3 paired outputs on both staging and production

Copilot

Pull request overview

Adds end-to-end test updates for HETA 1.2.0 outputs by expanding expected result artifacts to include the new parquet polygon exports and validating parquet↔GeoJSON feature parity.

Changes:

Extend SPOT_0_EXPECTED_RESULT_FILES / SPOT_1_EXPECTED_RESULT_FILES to include tissue_qc, tissue_segmentation, and cell_classification parquet outputs (now 12 expected files).
Update GUI/CLI e2e tests to assert 12 downloaded result files instead of 9.
Add parquet↔GeoJSON parity assertions by comparing parquet row counts to GeoJSON features counts for the three paired outputs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`tests/constants_test.py`	Updates expected output file lists and byte-size tolerances to include the three new parquet outputs for both production and staging.
`tests/aignostics/application/gui_test.py`	Adjusts expected result file count to 12 and adds parquet↔GeoJSON parity validation after download.
`tests/aignostics/application/cli_test.py`	Adjusts expected result file count to 12 and adds parquet↔GeoJSON parity validation after execution/download.

ari-nz · 2026-06-01T15:24:50Z

+        assert len(files_in_results_dir) == 12, (
+            f"Expected 12 files in {results_dir}, but found {len(files_in_results_dir)}: "


Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Copilot

Pull request overview

Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.

+def _build_minimal_wsi_input_item(gs_url: str, crc32c: str, expires_seconds: int) -> platform.InputItem:
+    """Build a minimal WSI InputItem supplying only the CRC32C and image URL."""
+    return platform.InputItem(
+        external_id=gs_url,
+        input_artifacts=[
+            platform.InputArtifact(
+                name="whole_slide_image",
+                download_url=platform.generate_signed_url(url=gs_url, expires_seconds=expires_seconds),
+                metadata={
+                    "checksum_base64_crc32c": crc32c,
+                    "media_type": "image/tiff",
+                },
+            )
+        ],
+    )




codecov · 2026-05-19T18:29:08Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.
see 11 files with indirect coverage changes

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

+        import pyarrow.parquet as pq
+
+        for parquet_filename, geojson_filename in parquet_geojson_pairs:
+            parquet_path = results_dir / parquet_filename
+            geojson_path = results_dir / geojson_filename
+            parquet_row_count = pq.read_metadata(parquet_path).num_rows


+    import pyarrow.parquet as pq
+
+    for parquet_filename, geojson_filename in parquet_geojson_pairs:
+        parquet_path = results_dir / parquet_filename
+        geojson_path = results_dir / geojson_filename
+        parquet_row_count = pq.read_metadata(parquet_path).num_rows


Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

+        import pyarrow.parquet as pq
+
+        for parquet_filename, geojson_filename in parquet_geojson_pairs:
+            parquet_path = results_dir / parquet_filename
+            geojson_path = results_dir / geojson_filename
+            parquet_row_count = pq.read_metadata(parquet_path).num_rows


+    import pyarrow.parquet as pq
+
+    for parquet_filename, geojson_filename in parquet_geojson_pairs:
+        parquet_path = results_dir / parquet_filename
+        geojson_path = results_dir / geojson_filename
+        parquet_row_count = pq.read_metadata(parquet_path).num_rows


blanca-pablos · 2026-05-21T10:23:03Z

+@pytest.mark.stress_only
+@pytest.mark.long_running
+@pytest.mark.timeout(timeout=TEST_APP_STRESS_SUBMIT_AND_FIND_SUBMIT_TIMEOUT_SECONDS)
+def test_platform_test_app_stress_submit() -> None:


When do these run? They could get very expensive if running through. Should we consider cancelling after acknowledging they have been submitted or so?

blanca-pablos · 2026-05-21T10:24:44Z

+    ]
+    import pyarrow.parquet as pq
+
+    for parquet_filename, geojson_filename in parquet_geojson_pairs:


too complicated maybe but a rough area check could be nice for the segmentation ones

- test-app: 0.0.6 → 1.0.0 (new version uses same he-tme input schema) - he-tme: 1.1.0 → 1.1.1 on staging - Remove SPECIAL_APPLICATION_ID/VERSION from staging (no longer needed) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…alization artifact - Re-add SPECIAL_APPLICATION_ID/VERSION to staging pointing to test-app 1.0.0 so e2e_test.py imports resolve on staging - Remove normalization:wsi input artifact from _get_spots_payload_for_special; test-app 1.0.0 only requires whole_slide_image, matching the he-tme schema Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

- Remove SPECIAL_APPLICATION_ID/VERSION from staging constants entirely - Guard the import in e2e_test.py with try/except so staging doesn't NameError - Add skipif(SPECIAL_APPLICATION_ID is None) to both special-app tests so they are silently skipped on staging but still run on production (0.99.0) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Simpler than a try/except guard: staging defines SPECIAL_APPLICATION_ID and SPECIAL_APPLICATION_VERSION as None, the regular import works, and the existing skipif(SPECIAL_APPLICATION_ID is None) handles the rest. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…e-tme 1.2.0 - Replace SPOT_1 with breast cancer slide 1603ba4c (BREAST/BREAST_CANCER, 6649×6578 at 0.25 MPP); preserve old 9375e3ed data as SPOT_4 - Add VIPS 10x resolution ambiguity note for SPOT_2, SPOT_3, SPOT_4 - Bump HETA_APPLICATION_VERSION to 1.2.0, TEST_APPLICATION_VERSION to 1.0.0 - Remove SPECIAL_APPLICATION concept; restore stress tests against test-app 1.0.0 - Unify payload builders via _build_wsi_input_item / _build_minimal_wsi_input_item - Update SPOT_1_EXPECTED_RESULT_FILES sizes from staging run 43a3bcd2 - Reduce PIPELINE_NODE_ACQUISITION_TIMEOUT_MINUTES to 25

…lidation

- Use pyarrow.parquet.read_metadata() instead of pd.read_parquet() to get row count from Parquet footer without loading polygon data - Use ijson streaming to count GeoJSON features without loading the full feature array into memory - Replace hard-coded file counts with len(SPOT_x_EXPECTED_RESULT_FILES) to avoid drift when the constants change - Sync qupath/gui_test.py to use len(SPOT_0_EXPECTED_RESULT_FILES) instead of the stale literal 9 - Remove unused _build_minimal_wsi_input_item dead code from e2e_test.py

…compliance

…le size constant

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

+        import pyarrow.parquet as pq
+
+        for parquet_filename, geojson_filename in parquet_geojson_pairs:
+            parquet_path = results_dir / parquet_filename
+            geojson_path = results_dir / geojson_filename
+            parquet_row_count = pq.read_metadata(parquet_path).num_rows
+            with geojson_path.open("rb") as f:
+                geojson_feature_count = sum(1 for _ in ijson.items(f, "features.item"))


+    import pyarrow.parquet as pq
+
+    for parquet_filename, geojson_filename in parquet_geojson_pairs:
+        parquet_path = results_dir / parquet_filename
+        geojson_path = results_dir / geojson_filename
+        parquet_row_count = pq.read_metadata(parquet_path).num_rows
+        with geojson_path.open("rb") as f:
+            geojson_feature_count = sum(1 for _ in ijson.items(f, "features.item"))


-SPECIAL_APPLICATION_SUBMIT_AND_FIND_DEADLINE_SECONDS_ON_40 = 60 * 60 * 3  # 3 hours
-SPECIAL_APPLICATION_SUBMIT_AND_FIND_SUBMIT_TIMEOUT_SECONDS = 60 * 30  # 30 minutes
-SPECIAL_APPLICATION_FIND_AND_VALIDATE_TIMEOUT_SECONDS = 60 * 60  # 60 minutes
+TEST_APP_STRESS_SLIDE_PER_RUN_COUNT = 100


- tissue_qc and tissue_segmentation: compare total polygon area (WKB via shapely) between parquet and GeoJSON, within 1% - cell_classification: compare polygon count, within 1% - Use pandas.read_parquet() instead of pyarrow.parquet directly, making the check engine-agnostic (pyarrow on 3.14, fastparquet on 3.11-3.13) - Extract shared helper assert_parquet_geojson_parity() to conftest.py to avoid duplication between cli_test and gui_test

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

+SPOT_4_GS_URL = (
+    "gs://aignostics-platform-ext-a4f7e9/python-sdk-tests/he-tme/slides/9375e3ed-28d2-4cf3-9fb9-8df9d11a6627.tiff"
+)
+SPOT_4_FILENAME = "9375e3ed-28d2-4cf3-9fb9-8df9d11a6627.tiff"
+SPOT_4_CRC32C = "9l3NNQ=="
+SPOT_4_FILESIZE = 14681750
+SPOT_4_RESOLUTION_MPP = 0.46499982
+SPOT_4_WIDTH = 3728
+SPOT_4_HEIGHT = 3640


+    """Assert parquet/GeoJSON output parity for the three he-tme polygon artifact pairs.
+
+    - tissue_qc and tissue_segmentation: total polygon area within 1%
+    - cell_classification: polygon count within 1%
+


    )
    assert result.exit_code == 0
-    assert "Zipped 11 files" in normalize_output(result.output)
+    assert "Zipped 16 files" in normalize_output(result.output)


+SPOT_4_GS_URL = (
+    "gs://aignostics-platform-ext-a4f7e9/python-sdk-tests/he-tme/slides/9375e3ed-28d2-4cf3-9fb9-8df9d11a6627.tiff"
+)
+SPOT_4_FILENAME = "9375e3ed-28d2-4cf3-9fb9-8df9d11a6627.tiff"
+SPOT_4_CRC32C = "9l3NNQ=="
+SPOT_4_FILESIZE = 14681750
+SPOT_4_RESOLUTION_MPP = 0.46499982
+SPOT_4_WIDTH = 3728
+SPOT_4_HEIGHT = 3640


+    """Assert parquet/GeoJSON output parity for the three he-tme polygon artifact pairs.
+
+    - tissue_qc and tissue_segmentation: total polygon area within 1%
+    - cell_classification: polygon count within 1%
+


    )
    assert result.exit_code == 0
-    assert "Zipped 11 files" in normalize_output(result.output)
+    assert "Zipped 16 files" in normalize_output(result.output)


…on test With he-tme 1.2.0 (new version), many runs in staging are PENDING/PROCESSING, causing the client-side has_output=True filter to page through all of them before finding 20 completed runs — exceeding the 60s timeout. Drop has_output and reduce limit to 5: item_count (total items submitted) already serves as a proxy, and the first API page returns enough runs instantly.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

+    # Find a run with fewer items than RESULTS_PAGE_SIZE.
+    # Omit has_output so the server-side filter is applied without client-side pagination:
+    # item_count already acts as a proxy (runs with no output show item_count=0 and fail
+    # the 0 < item_count <= RESULTS_PAGE_SIZE check below).
    runs = Service().application_runs(


+    for parquet_name, geojson_name in [
+        ("tissue_qc_parquet_polygons.parquet", "tissue_qc_geojson_polygons.json"),
+        ("tissue_segmentation_parquet_polygons.parquet", "tissue_segmentation_geojson_polygons.json"),
+    ]:
+        parquet_area = float(
+            shapely.area(
+                shapely.from_wkb(
+                    pd.read_parquet(results_dir / parquet_name, columns=["geometry"])["geometry"].to_numpy()
+                )
+            ).sum()
+        )
+        geojson_area = 0.0
+        with (results_dir / geojson_name).open("rb") as f:
+            for feature in ijson.items(f, "features.item"):
+                geojson_area += float(shapely.area(shapely.geometry.shape(feature["geometry"])))
+        assert geojson_area > 0, f"No area computed from {geojson_name}"
+        diff_pct = abs(parquet_area - geojson_area) / geojson_area
+        assert diff_pct <= tolerance, (
+            f"Total polygon area differs by >{tolerance * 100:.0f}% between "
+            f"{parquet_name} ({parquet_area:.2f}) and {geojson_name} ({geojson_area:.2f})"
+        )


+    parquet_count = len(pd.read_parquet(results_dir / "cell_classification_parquet_polygons.parquet", columns=[]))
+    with (results_dir / "cell_classification_geojson_polygons.json").open("rb") as f:
+        geojson_count = sum(1 for _ in ijson.items(f, "features.item"))
+    delta = abs(parquet_count - geojson_count)
+    assert delta <= max(1, round(parquet_count * tolerance)), (
+        f"Polygon count differs by >{tolerance * 100:.0f}% between "
+        f"cell_classification_parquet_polygons.parquet ({parquet_count}) "
+        f"and cell_classification_geojson_polygons.json ({geojson_count})"
+    )


+    # Find a run with fewer items than RESULTS_PAGE_SIZE.
+    # Omit has_output so the server-side filter is applied without client-side pagination:
+    # item_count already acts as a proxy (runs with no output show item_count=0 and fail
+    # the 0 < item_count <= RESULTS_PAGE_SIZE check below).
    runs = Service().application_runs(


+    for parquet_name, geojson_name in [
+        ("tissue_qc_parquet_polygons.parquet", "tissue_qc_geojson_polygons.json"),
+        ("tissue_segmentation_parquet_polygons.parquet", "tissue_segmentation_geojson_polygons.json"),
+    ]:
+        parquet_area = float(
+            shapely.area(
+                shapely.from_wkb(
+                    pd.read_parquet(results_dir / parquet_name, columns=["geometry"])["geometry"].to_numpy()
+                )
+            ).sum()
+        )
+        geojson_area = 0.0
+        with (results_dir / geojson_name).open("rb") as f:
+            for feature in ijson.items(f, "features.item"):
+                geojson_area += float(shapely.area(shapely.geometry.shape(feature["geometry"])))
+        assert geojson_area > 0, f"No area computed from {geojson_name}"
+        diff_pct = abs(parquet_area - geojson_area) / geojson_area
+        assert diff_pct <= tolerance, (
+            f"Total polygon area differs by >{tolerance * 100:.0f}% between "
+            f"{parquet_name} ({parquet_area:.2f}) and {geojson_name} ({geojson_area:.2f})"
+        )


+    parquet_count = len(pd.read_parquet(results_dir / "cell_classification_parquet_polygons.parquet", columns=[]))
+    with (results_dir / "cell_classification_geojson_polygons.json").open("rb") as f:
+        geojson_count = sum(1 for _ in ijson.items(f, "features.item"))
+    delta = abs(parquet_count - geojson_count)
+    assert delta <= max(1, round(parquet_count * tolerance)), (
+        f"Polygon count differs by >{tolerance * 100:.0f}% between "
+        f"cell_classification_parquet_polygons.parquet ({parquet_count}) "
+        f"and cell_classification_geojson_polygons.json ({geojson_count})"
+    )


sonarqubecloud · 2026-06-02T09:06:42Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
1.9% Duplication on New Code

See analysis details on SonarQube Cloud

Copilot AI review requested due to automatic review settings May 12, 2026 08:36

ari-nz requested review from a team and helmut-hoffer-von-ankershoffen as code owners May 12, 2026 08:36

ari-nz added the skip:test:long_running Skip long-running tests (≥5min) label May 12, 2026

Copilot started reviewing on behalf of ari-nz May 12, 2026 08:37 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

ari-nz force-pushed the chore/app-version-bumps branch from bd5f44a to 1a3e050 Compare May 12, 2026 13:41

ari-nz force-pushed the feat/heta-1.2.0-parquet-validation branch from 47de64d to 4bf84bb Compare May 12, 2026 13:42

ari-nz removed the skip:test:long_running Skip long-running tests (≥5min) label May 19, 2026

Copilot AI review requested due to automatic review settings May 19, 2026 11:27

Copilot started reviewing on behalf of ari-nz May 19, 2026 11:27 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

ari-nz force-pushed the feat/heta-1.2.0-parquet-validation branch from c65c3fc to afc60d1 Compare May 19, 2026 14:19

Copilot AI review requested due to automatic review settings May 19, 2026 17:07

ari-nz force-pushed the feat/heta-1.2.0-parquet-validation branch from afc60d1 to 8896207 Compare May 19, 2026 17:07

Copilot started reviewing on behalf of ari-nz May 19, 2026 17:08 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

ari-nz changed the base branch from chore/app-version-bumps to main May 19, 2026 17:41

Copilot AI review requested due to automatic review settings May 20, 2026 09:23

Copilot started reviewing on behalf of ari-nz May 20, 2026 09:23 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

ari-nz requested review from alexa-ca and Copilot May 20, 2026 15:16

Copilot started reviewing on behalf of ari-nz May 20, 2026 15:18 View session

ari-nz enabled auto-merge May 20, 2026 15:19

alexa-ca approved these changes May 20, 2026

View reviewed changes

Copilot AI reviewed May 20, 2026

View reviewed changes

blanca-pablos reviewed May 21, 2026

View reviewed changes

ari-nz and others added 9 commits June 1, 2026 16:54

feat(tests): add HETA 1.2.0 parquet size checks and GeoJSON parity va…

3a6c89a

…lidation

fix(tests): add blank line after lazy pyarrow import for ruff format …

387709e

…compliance

fix(tests): update stale assertions — 16 schemata files and SPOT_1 fi…

a6ad9d6

…le size constant

fix(tests): update SPOT_0 expected file sizes from he-tme 1.2.0 run

7ad6745

ari-nz force-pushed the feat/heta-1.2.0-parquet-validation branch from 03e47d1 to 7ad6745 Compare June 1, 2026 14:55

Copilot AI review requested due to automatic review settings June 1, 2026 14:55

Copilot started reviewing on behalf of ari-nz June 1, 2026 14:55 View session

Copilot AI reviewed Jun 1, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings June 1, 2026 15:06

Copilot started reviewing on behalf of ari-nz June 1, 2026 15:06 View session

fix(tests): update SPOT_1 expected file sizes from he-tme 1.2.0 run

8abb21a

Copilot AI reviewed Jun 1, 2026

View reviewed changes

fix(tests): replace exact schemata count with range check (12–16)

8143311

blanca-pablos approved these changes Jun 1, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings June 1, 2026 18:08

Copilot started reviewing on behalf of ari-nz June 1, 2026 18:08 View session

Copilot AI reviewed Jun 1, 2026

View reviewed changes

ci: Remove the sonar cloud duplication issue

109ebff

ari-nz merged commit 5a46352 into main Jun 2, 2026
23 checks passed

ari-nz deleted the feat/heta-1.2.0-parquet-validation branch June 2, 2026 09:38

		assert len(files_in_results_dir) == 12, (
		f"Expected 12 files in {results_dir}, but found {len(files_in_results_dir)}: "

Conversation

ari-nz commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

ari-nz Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

blanca-pablos May 21, 2026

Choose a reason for hiding this comment

Uh oh!

blanca-pablos May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

sonarqubecloud Bot commented Jun 2, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ari-nz commented May 12, 2026 •

edited

Loading

codecov Bot commented May 19, 2026 •

edited

Loading