feat(bulk-submit): add FHIR Bulk Data Submit ($bulk-submit) Data Consumer by aacruzgon · Pull Request #153 · HeliosSoftware/hfs

aacruzgon · 2026-06-12T17:11:30Z

FHIR Bulk Data Submit (`$bulk-submit`)

Implements HFS as a Data Consumer for the Argonaut Bulk Data Submit operation. A Data Provider POSTs $bulk-submit referencing a Bulk Export Manifest; HFS asynchronously fetches the manifest and NDJSON files, ingests them into the resource store, and exposes results through a status manifest. The synchronous ingestion engine (BulkSubmitProvider) is reused, with an async worker, lease, and fencing layer that mirrors the existing $export subsystem.

Why

HFS supported Bulk Data Export but had no inbound bulk ingestion path. This branch adds the Data Consumer side of the Argonaut spec so providers can push large datasets into HFS, including protected-file retrieval (SMART Backend Services) and encrypted payloads (JWE), with per-tenant concurrency and durable, leased job state suitable for multi-pod deployments.

Changes

REST (`helios-rest`)

$bulk-submit kickoff, $bulk-submit-status poll/manifest, cancel, and HFS-served artifact endpoints, registered before the resource catch-all.
Shared bulk_common Parameters helpers extracted and reused by both bulk export and submit.
HttpSubmitInputFetcher for manifest/file retrieval with gzip and optional JWE decryption; SMART backend-services outbound token provider (client_credentials + private_key_jwt) for requiresAccessToken files.
BulkSubmitConfig, AppState wiring, and new error mappings (Conflict, TooManyRequests, StorageError::BulkSubmit); bulk-submit operations advertised in the CapabilityStatement.

Persistence (`helios-persistence`)

Bulk-submit worker, claim/lease layer, ingest engine, SubmitInputFetcher trait, and RemoteManifest types; Replaced manifest status.
SQLite and PostgreSQL v8→v9 migrations plus claim/worker storage implementations (Postgres int4 columns typed as i32).
Backend capability split: BulkImport → BulkSubmitIngest + BulkSubmitRestWorker. SQLite/Postgres advertise both; S3 advertises ingest-only. Advisor parses the new names (keeps bulkimport as a compat alias).

Auth (`helios-auth`)

Retain raw scopes and add grants_operation for system/bulk-submit.

Server, CI, tooling, docs

hfs binary builds/spawns bulk-submit workers independently of export; bulk-submit-jwe feature passthrough.
New bulk-submit-smoke and inert inferno-bulk-submit-data workflows; docker/bulk-submit/ local compose example.
README HFS_BULK_SUBMIT_* env vars; persistence README capability matrix split into ingest vs REST worker rows.
Repo tooling: slimmed CLAUDE.md to architecture + skill pointers, added AGENTS.md, .claude/ (config, hooks, schemas, skills) and .agents/skills/.

Testing

cargo test -p helios-persistence — capability matrix, display, and advisor parse tests (SQLite/Postgres worker storage, S3 ingest-only).
cargo test -p helios-rest — $bulk-submit endpoint integration tests and outbound OAuth mock-IdP test.
Heavier coverage deferred to CI: bulk-submit-smoke external smoke workflow and the inert inferno-bulk-submit-data scaffold, plus the standard ci.yml lint/test-rust gates.

Notes

Migrations: SQLite and PostgreSQL schema v8→v9 add bulk-submit worker state — forward-only.
Backend availability: $bulk-submit REST is available on sqlite, postgres, and their -elasticsearch composites; other backends return 501. S3 supports only the synchronous ingest provider and never owns REST-worker job state.
Capability rename keeps bulkimport as a parse alias for BulkSubmitIngest, so existing advisor input stays valid.
Config: new HFS_BULK_SUBMIT_* env vars (see README/skill). JWE decryption requires the bulk-submit-jwe feature.
Auth: all surfaces require the system/bulk-submit scope; status/cancel/file endpoints also enforce submission ownership.

…webtoken)

…est engine

…tate

…orage

…k-submit

…lk submit

…ests

…ryption

…le fetch

…tly of export

…butes)

Add shared project configuration under .claude/: settings.json wiring the hfs-policy and codex plan/review hooks, the hook scripts and JSON schemas they use, the operational SKILL.md definitions, and a local .gitignore for transient state/ and debug/ output.

The bulk_manifests counter columns (total_entries, processed_entries, failed_entries) are declared INTEGER (int4), but the Postgres backend read them as i64 in get_manifest and bound i64 params in the counts UPDATE. tokio-postgres rejects the int4<->i64 width conversion and panicked the request handler, dropping the connection mid-response. This surfaced as `curl: (52) Empty reply from server` when polling $bulk-submit-status, failing the external smoke workflow. SQLite was unaffected because it is dynamically typed. Read and bind these three columns as i32 to match the schema. Tests: cargo test -p helios-persistence --lib (655 passed); fmt and CI-style clippy clean.

query_pairs.rs defined a first_value helper that was never used; the live copy imported by the bulk handlers lives in bulk_common.rs. The dead duplicate tripped CI-style clippy (-D warnings dead_code). Remove it; no behavior change. Tests: cargo test -p helios-rest --lib (220 passed); CI-style clippy clean.

…umns The first int4/i64 fix covered get_manifest and process_entries, but the async worker path had the same mismatch on other INTEGER columns: - bulk_entry_results.line_number (INTEGER): read as i64 in get_entry_results and bound as i64 in the entry-result INSERT. - bulk_manifests.processed_entries / failed_entries (INTEGER): bound as i64 in update_manifest_progress. tokio-postgres rejects the int4<->i64 conversion. update_manifest_progress runs after the worker ingests each file, so the bind error aborted run_job and left the manifest stuck `processing`; it was then re-leased and failed again indefinitely, so the submission never reached `complete` and $bulk-submit-status returned 202 forever (smoke poll loop exhausted, exit 1). SQLite is dynamically typed and unaffected. Read/bind these three INTEGER columns as i32. BIGINT columns (fencing_token, last_processed_line, line_count, byte_count) and COUNT/SUM aggregates remain i64. Tests: cargo test -p helios-persistence --lib bulk_submit (28 passed); fmt and CI-style clippy clean. The postgres round trip itself is covered only by the external smoke workflow (no local Docker).

Expand the bulk-submit external smoke workflow to use the same FHIR version, backend, and output matrix shape as bulk export. Keep unsupported primary stores in the matrix with endpoint-unavailable expectations while running the full ingest flow for SQLite/PostgreSQL and their Elasticsearch composites. Tests: parsed the workflow YAML, compared backend/output rows with bulk export, ran git diff --check, and ran cargo check -p helios-hfs --no-default-features --features R4,R4B,R5,sqlite,elasticsearch,postgres,mongodb,s3.

Document that the smoke matrix tests REST worker availability, while S3 only exposes the submit ingest provider capability. Tests: not run (comment-only workflow update).

Remove the import synonym from the submit conformance scaffold so the workflow language matches the split submit capability model. Tests: not run (comment-only workflow update).

Document Bulk Submit ingest separately from full REST worker support so S3 is described as provider-only while SQLite and Postgres own job state. Tests: not run (documentation-only change).

Report SQLite and Postgres as full bulk-submit REST worker backends, report S3 as ingest-only, and accept the new split capability names in advisor input. Keep bulkimport as a compatibility alias for BulkSubmitIngest. Tests: cargo test -p helios-persistence --lib test_parse_bulk_submit_capabilities

Advertise Postgres bulk export, bulk-submit ingest, and full bulk-submit REST worker support through Backend::supports and Backend::capabilities. Tests: cargo test -p helios-persistence --features postgres --test postgres_tests test_postgres_expected_capabilities

Move S3 backend capability reporting into a static declaration list and replace the old bulk import label with BulkSubmitIngest. This keeps S3 reporting provider-level ingest support without claiming REST worker support. Tests: cargo test -p helios-persistence --features s3 --test s3_tests test_s3_capabilities_declared

Advertise SQLite bulk export, bulk-submit ingest, and full bulk-submit REST worker support through backend capability reporting and its local capability test. Tests: cargo test -p helios-persistence --lib test_backend_capabilities

Replace the ambiguous BulkImport backend capability with separate BulkSubmitIngest and BulkSubmitRestWorker variants, including display strings for reporting. Tests: cargo test -p helios-persistence --lib test_backend_capability_display

Update the shared backend capability matrix so SQLite and Postgres advertise full bulk-submit REST worker support, while S3 advertises ingest-only support. Tests: cargo check -p helios-persistence

Extend Postgres capability tests to cover bulk export, bulk-submit ingest, and full bulk-submit REST worker reporting. Tests: cargo test -p helios-persistence --features postgres --test postgres_tests test_postgres_expected_capabilities

Update the S3 capability declaration test to assert BulkSubmitIngest without BulkSubmitRestWorker, and avoid AWS SDK initialization for this declaration-only check. Tests: cargo test -p helios-persistence --features s3 --test s3_tests test_s3_capabilities_declared

Clarify in the bulk-data-submit skill (both .claude and .agents copies) that the backend capability splits into BulkSubmitIngest and BulkSubmitRestWorker, with SQLite/Postgres advertising both and S3 ingest-only, mirroring the persistence README.

# Conflicts: # .github/workflows/bulk-submit-smoke.yml # CLAUDE.md

codecov · 2026-06-12T17:37:16Z

Codecov Report

❌ Patch coverage is 82.81700% with 477 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
crates/persistence/src/core/bulk_submit_worker.rs	71.40%	163 Missing ⚠️
crates/rest/src/handlers/bulk_submit.rs	84.24%	129 Missing ⚠️
crates/rest/src/config.rs	77.00%	63 Missing ⚠️
...tes/persistence/src/backends/sqlite/bulk_submit.rs	81.34%	50 Missing ⚠️
crates/rest/src/lib.rs	10.00%	36 Missing ⚠️
crates/rest/src/bulk_submit_fetcher.rs	82.00%	18 Missing ⚠️
crates/rest/src/state.rs	86.66%	6 Missing ⚠️
crates/auth/src/scope/mod.rs	91.66%	4 Missing ⚠️
crates/persistence/src/backends/sqlite/backend.rs	50.00%	3 Missing ⚠️
crates/persistence/src/core/bulk_submit.rs	33.33%	2 Missing ⚠️
... and 2 more

📢 Thoughts on this report? Let us know!

Relax the exact serde/serde_json pins so tokio-postgres can resolve to the patched 0.7.18 (postgres-types 0.2.14 requires serde_core >= 1.0.221). Pin serde to =1.0.224 rather than the latest 1.0.228: serde >= 1.0.225 renames its internal `__private` module to a version-suffixed form, which breaks the FhirSerde derive macro's reference to `serde::__private::ser::FlatMapSerializer`. 1.0.224 still satisfies postgres-types and keeps the plain module. Pin serde_json to =1.0.144 (the minimum tokio-postgres 0.7.18 requires) rather than 1.0.150: later patches normalize decimal exponents `E` -> `e`, breaking exact original-string preservation in DecimalElement. Tests: helios-serde (20 pass, incl. test_json_decimal_out_of_range), helios-persistence --features postgres lib (665 pass).

Resolve the postgres dependency chain to versions that fix three advisories flagged by cargo audit: - postgres-protocol 0.6.11 -> 0.6.12 RUSTSEC-2026-0179 (high): unbounded SCRAM iteration count DoS RUSTSEC-2026-0180: panic decoding a malformed hstore value - tokio-postgres 0.7.13 -> 0.7.18 RUSTSEC-2026-0178: panic on a DataRow with fewer fields than columns - postgres-types 0.2.9 -> 0.2.14 (pulled in by tokio-postgres 0.7.18) Also reflects the serde 1.0.220 -> 1.0.224 and serde_json 1.0.143 -> 1.0.144 moves required to satisfy the upgraded postgres crates.

cargo audit flags RUSTSEC-2026-0176 and RUSTSEC-2026-0177 against pyo3, which is only pulled in by pysof. pysof is excluded from the default workspace build and shipped separately as a Python wheel, and neither advisory path is reachable in its code (it calls neither nth/nth_back on PyList/PyTuple iterators nor PyCFunction::new_closure). The proper fix (pyo3 >= 0.29) is blocked upstream: pysof depends on pythonize, whose latest release (0.28) only supports pyo3 0.28. Add both advisories to the --ignore list with a documented rationale; remove once pythonize ships a pyo3 0.29 build.

BulkSubmitConfig::validate had no unit coverage while the parallel BulkExportConfig suite did. Add tests for every error branch (output backend, S3 bucket requirement, access-token posture, signing alg, zero-valued concurrency/batch/heartbeat, lease-vs-heartbeat, cleanup interval) plus multi-error accumulation and ServerConfig propagation. Tests: cargo test -p helios-rest --lib (config::tests) passes.

The Conflict/TooManyRequests variants and the StorageError::BulkSubmit to RestError mapping added for $bulk-submit had zero patch coverage. Add Display tests, into_response status-code assertions (409/429), and mapping tests for every BulkSubmitError variant (404/409/422/500). Tests: cargo test -p helios-rest --lib (error::tests) passes.

Add unit tests for the bulk_common helpers shared by $export and $bulk-submit: parse_query_pairs, collect_multi, first_value, parse_instant, prefer_handling, has_respond_async, and pairs_from_parameters (scalar value[x] and valueReference shapes). Tests: cargo test -p helios-rest --lib (handlers::bulk_common::tests) passes.

Add tests for JwtClientCredentialsTokenProvider::new: it rejects an unsupported signing alg and malformed ES384/RS384 PEMs, and accepts a valid ES384 key, exercising client_assertion to mint a well-formed 3-segment compact JWS (using a throwaway test key). Tests: cargo test -p helios-rest --lib (bulk_submit_oauth::tests) passes.

Add tests for HttpSubmitInputFetcher: the err() backend-error constructor and build_get's anonymous, provider-header, and token-required-without-provider error paths (no network required). Tests: cargo test -p helios-rest --lib (bulk_submit_fetcher::tests) passes.

# Conflicts: # .github/workflows/ci.yml # CLAUDE.md # Cargo.lock # Cargo.toml # crates/rest/src/state.rs

# Conflicts: # .agents/skills/bulk-data-submit/SKILL.md # .agents/skills/work-with-hts/SKILL.md # .claude/skills/bulk-data-submit/SKILL.md # .claude/skills/work-with-hts/SKILL.md

aacruzgon added 30 commits June 3, 2026 11:21

chore(deps): lock new bulk-submit dependencies (aes-gcm, base64, json…

892eb87

…webtoken)

feat(persistence): add Replaced manifest status for bulk submit

6d57fb4

feat(persistence): add SubmitInputFetcher trait and RemoteManifest types

f38a2d7

feat(persistence): add bulk-submit worker, claim/lease layer, and ing…

7797b09

…est engine

feat(persistence): export bulk-submit worker and input modules

0f57b46

feat(persistence): sqlite v8->v9 migration for bulk-submit worker state

68a4159

feat(persistence): implement sqlite bulk-submit claim and worker storage

1f5b623

feat(persistence): postgres v8->v9 migration for bulk-submit worker s…

f7a7f46

…tate

feat(persistence): implement postgres bulk-submit claim and worker st…

c53e2e1

…orage

feat(auth): retain raw scopes and add grants_operation for system/bul…

85aac0d

…k-submit

chore(rest): add jsonwebtoken and optional aes-gcm/base64 deps for bu…

86faa3a

…lk submit

feat(rest): add BulkSubmitConfig

439eb11

feat(rest): map StorageError::BulkSubmit and add Conflict/TooManyRequ…

4a83f2b

…ests

feat(rest): wire bulk-submit subsystem into AppState

9103852

refactor(rest): extract shared bulk export/submit Parameters helpers

7d479dd

refactor(rest): use shared bulk_common helpers in bulk export

9f9fff3

feat(rest): add $bulk-submit handlers (kickoff/status/poll/cancel/file)

2a91a6b

feat(rest): advertise bulk-submit operations in CapabilityStatement

fce3210

feat(rest): register bulk_common and bulk_submit handler modules

5e86caf

feat(rest): register $bulk-submit routes before resource catch-all

454819d

feat(rest): add create_app_with_auth_and_bulk and BulkSubmitBundle

ecc9cb0

feat(rest): add HttpSubmitInputFetcher with gzip and optional JWE dec…

99e86d3

…ryption

feat(rest): add SMART backend-services outbound token provider for fi…

729205b

…le fetch

test(rest): add $bulk-submit endpoint integration tests

2a10987

test(rest): add outbound OAuth mock-IdP integration test

9dbd3b3

chore(hfs): add bulk-submit-jwe feature passthrough

fd99d43

feat(hfs): build/spawn bulk-submit workers and wire submit independen…

ff5ac11

…tly of export

fix(hts): drop redundant inner cfg attribute (clippy duplicated_attri…

3009136

…butes)

docs: document Bulk Data Submit ($bulk-submit) [skip ci]

aff1ae9

docs(readme): add HFS_BULK_SUBMIT_* environment variables [skip ci]

7d9a795

aacruzgon added 18 commits June 11, 2026 10:32

ci(bulk-submit): clarify smoke capability split

549eecb

Document that the smoke matrix tests REST worker availability, while S3 only exposes the submit ingest provider capability. Tests: not run (comment-only workflow update).

ci(bulk-submit): clarify inferno scaffold wording

e0eb25d

Remove the import synonym from the submit conformance scaffold so the workflow language matches the split submit capability model. Tests: not run (comment-only workflow update).

docs(persistence): split bulk submit capability docs [skip ci]

d936f14

Document Bulk Submit ingest separately from full REST worker support so S3 is described as provider-only while SQLite and Postgres own job state. Tests: not run (documentation-only change).

feat(persistence): report sqlite submit capabilities

8131363

Advertise SQLite bulk export, bulk-submit ingest, and full bulk-submit REST worker support through backend capability reporting and its local capability test. Tests: cargo test -p helios-persistence --lib test_backend_capabilities

test(persistence): align submit capability matrix

8f8b634

Update the shared backend capability matrix so SQLite and Postgres advertise full bulk-submit REST worker support, while S3 advertises ingest-only support. Tests: cargo check -p helios-persistence

Merge remote-tracking branch 'origin/main' into feature/bulk-submit

e63fcca

# Conflicts: # .github/workflows/bulk-submit-smoke.yml # CLAUDE.md

aacruzgon and others added 10 commits June 12, 2026 14:14

Merge branch 'main' into feature/bulk-submit

1f0fe8f

# Conflicts: # .github/workflows/ci.yml # CLAUDE.md # Cargo.lock # Cargo.toml # crates/rest/src/state.rs

Merge branch 'main' into feature/bulk-submit

3a179ab

# Conflicts: # .agents/skills/bulk-data-submit/SKILL.md # .agents/skills/work-with-hts/SKILL.md # .claude/skills/bulk-data-submit/SKILL.md # .claude/skills/work-with-hts/SKILL.md

smunini merged commit e7c073a into main Jun 13, 2026
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bulk-submit): add FHIR Bulk Data Submit ($bulk-submit) Data Consumer#153

feat(bulk-submit): add FHIR Bulk Data Submit ($bulk-submit) Data Consumer#153
smunini merged 70 commits into
mainfrom
feature/bulk-submit

aacruzgon commented Jun 12, 2026

Uh oh!

codecov Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aacruzgon commented Jun 12, 2026

FHIR Bulk Data Submit ($bulk-submit)

Why

Changes

REST (helios-rest)

Persistence (helios-persistence)

Auth (helios-auth)

Server, CI, tooling, docs

Testing

Notes

Uh oh!

codecov Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FHIR Bulk Data Submit (`$bulk-submit`)

REST (`helios-rest`)

Persistence (`helios-persistence`)

Auth (`helios-auth`)

codecov Bot commented Jun 12, 2026 •

edited

Loading