feat(bulk-submit): add FHIR Bulk Data Submit ($bulk-submit) Data Consumer#153
Merged
Conversation
Add shared project configuration under .claude/: settings.json wiring the hfs-policy and codex plan/review hooks, the hook scripts and JSON schemas they use, the operational SKILL.md definitions, and a local .gitignore for transient state/ and debug/ output.
The bulk_manifests counter columns (total_entries, processed_entries, failed_entries) are declared INTEGER (int4), but the Postgres backend read them as i64 in get_manifest and bound i64 params in the counts UPDATE. tokio-postgres rejects the int4<->i64 width conversion and panicked the request handler, dropping the connection mid-response. This surfaced as `curl: (52) Empty reply from server` when polling $bulk-submit-status, failing the external smoke workflow. SQLite was unaffected because it is dynamically typed. Read and bind these three columns as i32 to match the schema. Tests: cargo test -p helios-persistence --lib (655 passed); fmt and CI-style clippy clean.
query_pairs.rs defined a first_value helper that was never used; the live copy imported by the bulk handlers lives in bulk_common.rs. The dead duplicate tripped CI-style clippy (-D warnings dead_code). Remove it; no behavior change. Tests: cargo test -p helios-rest --lib (220 passed); CI-style clippy clean.
…umns The first int4/i64 fix covered get_manifest and process_entries, but the async worker path had the same mismatch on other INTEGER columns: - bulk_entry_results.line_number (INTEGER): read as i64 in get_entry_results and bound as i64 in the entry-result INSERT. - bulk_manifests.processed_entries / failed_entries (INTEGER): bound as i64 in update_manifest_progress. tokio-postgres rejects the int4<->i64 conversion. update_manifest_progress runs after the worker ingests each file, so the bind error aborted run_job and left the manifest stuck `processing`; it was then re-leased and failed again indefinitely, so the submission never reached `complete` and $bulk-submit-status returned 202 forever (smoke poll loop exhausted, exit 1). SQLite is dynamically typed and unaffected. Read/bind these three INTEGER columns as i32. BIGINT columns (fencing_token, last_processed_line, line_count, byte_count) and COUNT/SUM aggregates remain i64. Tests: cargo test -p helios-persistence --lib bulk_submit (28 passed); fmt and CI-style clippy clean. The postgres round trip itself is covered only by the external smoke workflow (no local Docker).
Expand the bulk-submit external smoke workflow to use the same FHIR version, backend, and output matrix shape as bulk export. Keep unsupported primary stores in the matrix with endpoint-unavailable expectations while running the full ingest flow for SQLite/PostgreSQL and their Elasticsearch composites. Tests: parsed the workflow YAML, compared backend/output rows with bulk export, ran git diff --check, and ran cargo check -p helios-hfs --no-default-features --features R4,R4B,R5,sqlite,elasticsearch,postgres,mongodb,s3.
Document that the smoke matrix tests REST worker availability, while S3 only exposes the submit ingest provider capability. Tests: not run (comment-only workflow update).
Remove the import synonym from the submit conformance scaffold so the workflow language matches the split submit capability model. Tests: not run (comment-only workflow update).
Document Bulk Submit ingest separately from full REST worker support so S3 is described as provider-only while SQLite and Postgres own job state. Tests: not run (documentation-only change).
Report SQLite and Postgres as full bulk-submit REST worker backends, report S3 as ingest-only, and accept the new split capability names in advisor input. Keep bulkimport as a compatibility alias for BulkSubmitIngest. Tests: cargo test -p helios-persistence --lib test_parse_bulk_submit_capabilities
Advertise Postgres bulk export, bulk-submit ingest, and full bulk-submit REST worker support through Backend::supports and Backend::capabilities. Tests: cargo test -p helios-persistence --features postgres --test postgres_tests test_postgres_expected_capabilities
Move S3 backend capability reporting into a static declaration list and replace the old bulk import label with BulkSubmitIngest. This keeps S3 reporting provider-level ingest support without claiming REST worker support. Tests: cargo test -p helios-persistence --features s3 --test s3_tests test_s3_capabilities_declared
Advertise SQLite bulk export, bulk-submit ingest, and full bulk-submit REST worker support through backend capability reporting and its local capability test. Tests: cargo test -p helios-persistence --lib test_backend_capabilities
Replace the ambiguous BulkImport backend capability with separate BulkSubmitIngest and BulkSubmitRestWorker variants, including display strings for reporting. Tests: cargo test -p helios-persistence --lib test_backend_capability_display
Update the shared backend capability matrix so SQLite and Postgres advertise full bulk-submit REST worker support, while S3 advertises ingest-only support. Tests: cargo check -p helios-persistence
Extend Postgres capability tests to cover bulk export, bulk-submit ingest, and full bulk-submit REST worker reporting. Tests: cargo test -p helios-persistence --features postgres --test postgres_tests test_postgres_expected_capabilities
Update the S3 capability declaration test to assert BulkSubmitIngest without BulkSubmitRestWorker, and avoid AWS SDK initialization for this declaration-only check. Tests: cargo test -p helios-persistence --features s3 --test s3_tests test_s3_capabilities_declared
Clarify in the bulk-data-submit skill (both .claude and .agents copies) that the backend capability splits into BulkSubmitIngest and BulkSubmitRestWorker, with SQLite/Postgres advertising both and S3 ingest-only, mirroring the persistence README.
# Conflicts: # .github/workflows/bulk-submit-smoke.yml # CLAUDE.md
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Relax the exact serde/serde_json pins so tokio-postgres can resolve to the patched 0.7.18 (postgres-types 0.2.14 requires serde_core >= 1.0.221). Pin serde to =1.0.224 rather than the latest 1.0.228: serde >= 1.0.225 renames its internal `__private` module to a version-suffixed form, which breaks the FhirSerde derive macro's reference to `serde::__private::ser::FlatMapSerializer`. 1.0.224 still satisfies postgres-types and keeps the plain module. Pin serde_json to =1.0.144 (the minimum tokio-postgres 0.7.18 requires) rather than 1.0.150: later patches normalize decimal exponents `E` -> `e`, breaking exact original-string preservation in DecimalElement. Tests: helios-serde (20 pass, incl. test_json_decimal_out_of_range), helios-persistence --features postgres lib (665 pass).
Resolve the postgres dependency chain to versions that fix three advisories
flagged by cargo audit:
- postgres-protocol 0.6.11 -> 0.6.12
RUSTSEC-2026-0179 (high): unbounded SCRAM iteration count DoS
RUSTSEC-2026-0180: panic decoding a malformed hstore value
- tokio-postgres 0.7.13 -> 0.7.18
RUSTSEC-2026-0178: panic on a DataRow with fewer fields than columns
- postgres-types 0.2.9 -> 0.2.14 (pulled in by tokio-postgres 0.7.18)
Also reflects the serde 1.0.220 -> 1.0.224 and serde_json 1.0.143 -> 1.0.144
moves required to satisfy the upgraded postgres crates.
cargo audit flags RUSTSEC-2026-0176 and RUSTSEC-2026-0177 against pyo3, which is only pulled in by pysof. pysof is excluded from the default workspace build and shipped separately as a Python wheel, and neither advisory path is reachable in its code (it calls neither nth/nth_back on PyList/PyTuple iterators nor PyCFunction::new_closure). The proper fix (pyo3 >= 0.29) is blocked upstream: pysof depends on pythonize, whose latest release (0.28) only supports pyo3 0.28. Add both advisories to the --ignore list with a documented rationale; remove once pythonize ships a pyo3 0.29 build.
BulkSubmitConfig::validate had no unit coverage while the parallel BulkExportConfig suite did. Add tests for every error branch (output backend, S3 bucket requirement, access-token posture, signing alg, zero-valued concurrency/batch/heartbeat, lease-vs-heartbeat, cleanup interval) plus multi-error accumulation and ServerConfig propagation. Tests: cargo test -p helios-rest --lib (config::tests) passes.
The Conflict/TooManyRequests variants and the StorageError::BulkSubmit to RestError mapping added for $bulk-submit had zero patch coverage. Add Display tests, into_response status-code assertions (409/429), and mapping tests for every BulkSubmitError variant (404/409/422/500). Tests: cargo test -p helios-rest --lib (error::tests) passes.
Add unit tests for the bulk_common helpers shared by $export and $bulk-submit: parse_query_pairs, collect_multi, first_value, parse_instant, prefer_handling, has_respond_async, and pairs_from_parameters (scalar value[x] and valueReference shapes). Tests: cargo test -p helios-rest --lib (handlers::bulk_common::tests) passes.
Add tests for JwtClientCredentialsTokenProvider::new: it rejects an unsupported signing alg and malformed ES384/RS384 PEMs, and accepts a valid ES384 key, exercising client_assertion to mint a well-formed 3-segment compact JWS (using a throwaway test key). Tests: cargo test -p helios-rest --lib (bulk_submit_oauth::tests) passes.
Add tests for HttpSubmitInputFetcher: the err() backend-error constructor and build_get's anonymous, provider-header, and token-required-without-provider error paths (no network required). Tests: cargo test -p helios-rest --lib (bulk_submit_fetcher::tests) passes.
# Conflicts: # .github/workflows/ci.yml # CLAUDE.md # Cargo.lock # Cargo.toml # crates/rest/src/state.rs
# Conflicts: # .agents/skills/bulk-data-submit/SKILL.md # .agents/skills/work-with-hts/SKILL.md # .claude/skills/bulk-data-submit/SKILL.md # .claude/skills/work-with-hts/SKILL.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
FHIR Bulk Data Submit (
$bulk-submit)Implements HFS as a Data Consumer for the Argonaut Bulk Data Submit operation. A Data Provider POSTs
$bulk-submitreferencing a Bulk Export Manifest; HFS asynchronously fetches the manifest and NDJSON files, ingests them into the resource store, and exposes results through a status manifest. The synchronous ingestion engine (BulkSubmitProvider) is reused, with an async worker, lease, and fencing layer that mirrors the existing$exportsubsystem.Why
HFS supported Bulk Data Export but had no inbound bulk ingestion path. This branch adds the Data Consumer side of the Argonaut spec so providers can push large datasets into HFS, including protected-file retrieval (SMART Backend Services) and encrypted payloads (JWE), with per-tenant concurrency and durable, leased job state suitable for multi-pod deployments.
Changes
REST (
helios-rest)$bulk-submitkickoff,$bulk-submit-statuspoll/manifest, cancel, and HFS-served artifact endpoints, registered before the resource catch-all.bulk_commonParameters helpers extracted and reused by both bulk export and submit.HttpSubmitInputFetcherfor manifest/file retrieval with gzip and optional JWE decryption; SMART backend-services outbound token provider (client_credentials+private_key_jwt) forrequiresAccessTokenfiles.BulkSubmitConfig, AppState wiring, and new error mappings (Conflict,TooManyRequests,StorageError::BulkSubmit); bulk-submit operations advertised in the CapabilityStatement.Persistence (
helios-persistence)SubmitInputFetchertrait, andRemoteManifesttypes;Replacedmanifest status.i32).BulkImport→BulkSubmitIngest+BulkSubmitRestWorker. SQLite/Postgres advertise both; S3 advertises ingest-only. Advisor parses the new names (keepsbulkimportas a compat alias).Auth (
helios-auth)grants_operationforsystem/bulk-submit.Server, CI, tooling, docs
hfsbinary builds/spawns bulk-submit workers independently of export;bulk-submit-jwefeature passthrough.bulk-submit-smokeand inertinferno-bulk-submit-dataworkflows;docker/bulk-submit/local compose example.HFS_BULK_SUBMIT_*env vars; persistence README capability matrix split into ingest vs REST worker rows.CLAUDE.mdto architecture + skill pointers, addedAGENTS.md,.claude/(config, hooks, schemas, skills) and.agents/skills/.Testing
cargo test -p helios-persistence— capability matrix, display, and advisor parse tests (SQLite/Postgres worker storage, S3 ingest-only).cargo test -p helios-rest—$bulk-submitendpoint integration tests and outbound OAuth mock-IdP test.bulk-submit-smokeexternal smoke workflow and the inertinferno-bulk-submit-datascaffold, plus the standardci.ymllint/test-rust gates.Notes
$bulk-submitREST is available onsqlite,postgres, and their-elasticsearchcomposites; other backends return501. S3 supports only the synchronous ingest provider and never owns REST-worker job state.bulkimportas a parse alias forBulkSubmitIngest, so existing advisor input stays valid.HFS_BULK_SUBMIT_*env vars (see README/skill). JWE decryption requires thebulk-submit-jwefeature.system/bulk-submitscope; status/cancel/file endpoints also enforce submission ownership.