diff --git a/docs/specs/case-sensitive-arn-id-collision-design.md b/docs/specs/case-sensitive-arn-id-collision-design.md new file mode 100644 index 0000000..557d90e --- /dev/null +++ b/docs/specs/case-sensitive-arn-id-collision-design.md @@ -0,0 +1,168 @@ +# Case-Sensitive ARN ID Collision Design + +## Purpose + +This checkpoint documents a confirmed deterministic ID collision before +changing IAMScope ID behavior. The goal is to pin the bug, compatibility risk, +candidate migration shape, and required tests before any production code +change. + +## Confirmed Bug + +`iamscope/identity/deterministic_ids.py::canonical_id` lowercases all string +fields before hashing. + +As a result, `node_id(provider, node_type, provider_id)` collides for +case-distinct provider IDs. AWS IAM role and user names are case-sensitive, so +these two distinct ARNs currently produce the same `node_id`: + +- `arn:aws:iam::000000000000:role/CaseRole` +- `arn:aws:iam::000000000000:role/caserole` + +`edge_id` can also be affected because the shared `canonical_id` function +lowercases `src_provider_id` and `dst_provider_id` before hashing. A pair of +edges that differs only by source or destination ARN case can therefore collide +even when AWS treats the principals or resources as distinct. + +`constraint_id` needs separate review. Some constraint fields, such as provider, +constraint type, scope type, account IDs, and Organizations IDs, may be +intentionally case-insensitive or fixed-case. Other fields, such as policy or +statement identifiers, may have different semantics. This design does not +change `constraint_id`. + +## Minimal Local Reproduction + +```python +from iamscope.identity.deterministic_ids import node_id + +a = node_id("aws", "IAMRole", "arn:aws:iam::000000000000:role/CaseRole") +b = node_id("aws", "IAMRole", "arn:aws:iam::000000000000:role/caserole") +assert a == b # current bug +``` + +The expected fixed behavior is that the two IDs differ because the provider IDs +refer to case-distinct IAM roles. + +## Why It Matters + +The current collision can cause IAMScope to deduplicate distinct principals or +resources in the fact graph. That can corrupt downstream evidence in both +directions: + +- a distinct case-sensitive principal can disappear during graph construction; +- edges can attach to the wrong merged node; +- reasoners can evaluate a path against the wrong principal or target; +- findings can cite evidence from a merged shape that does not exist as a + single AWS identity; +- comparisons across runs can look stable while hiding a graph identity loss. + +This is especially risky for IAM users and roles because IAM names are +case-sensitive within an AWS account. + +## Compatibility Risk + +Changing deterministic ID behavior is high compatibility risk. The current ID +module already documents that algorithm changes can break ARF-RT references, +observation logs, probe overlays, and cross-run comparisons. + +A direct in-place fix would change `node_id` and potentially `edge_id` values +for every node or edge whose canonical input contains uppercase characters. +That would make old and new scenarios non-comparable unless the algorithm +version and migration rules are explicit. + +The design must therefore treat this as an ID algorithm migration, not as a +silent bug fix. + +## ARF-RT and Downstream Blast Radius + +Known downstream surfaces that depend on stable IDs include: + +- scenario `nodes[].node_id`; +- scenario `edges[].edge_id`; +- edge-constraint bindings keyed by `edge_id`; +- findings evidence that cites node and edge IDs; +- ARF-RT edge remapping and wrapper summaries; +- probe overlays keyed by `edge_id`; +- observation logs and review artifacts that cite IDs; +- cross-run comparisons and `findings_diff` output; +- any external reviewer notes that reference existing scenario IDs. + +The migration must make old and new ID spaces explicit so consumers do not +mistakenly compare v2 and v3 IDs as if they were the same identifier family. + +## Candidate Fix + +Introduce a new deterministic ID algorithm version, tentatively: + +`sha256_null_separated_v3_case_sensitive_provider_ids` + +The candidate v3 behavior should use field-aware canonicalization: + +- keep provider and structural type fields normalized where they are intended to + be case-insensitive, such as `provider`, `node_type`, `edge_type`, and + `region`; +- preserve case for provider-owned identity fields, especially + `provider_id`, `src_provider_id`, and `dst_provider_id`; +- keep feature canonicalization deterministic and unchanged unless a separate + review finds a feature-level case collision; +- review `constraint_id` separately before deciding whether it should stay on + the existing canonicalization behavior or move to a field-aware formula. + +The code fix should avoid a broad "never lowercase anything" change. The safer +boundary is to make each deterministic ID formula choose the canonicalization +rule for each field it owns. + +## Migration and Versioning Plan + +1. Add focused regression tests that pin the current collision as design + evidence, preferably as `xfail` or another non-enforced marker until the + algorithm migration is approved. +2. Define the v3 formula in code and update the public `ID_ALGORITHM` metadata + value in the same slice as the behavior change. +3. Emit scenario metadata that clearly identifies the v3 algorithm. +4. Treat v2 and v3 scenarios as different ID spaces. Cross-version comparison + tools should refuse ID-based equality unless both artifacts use the same + `id_algorithm`. +5. Add migration notes for ARF-RT, probe overlays, observation logs, and + findings diffs. +6. Preserve deterministic sort order after IDs change. +7. Do not rewrite historical public artifacts in this repo. If historical + scenarios are regenerated later, label them as regenerated under the new + algorithm. + +## Tests Required Before Code Fix + +Before changing production ID behavior, add tests that cover: + +- `node_id` differs for case-distinct IAM role ARNs. +- `node_id` differs for case-distinct IAM user ARNs. +- `node_id` remains stable for exact repeated inputs. +- `edge_id` differs when only `src_provider_id` case differs. +- `edge_id` differs when only `dst_provider_id` case differs. +- `edge_id` still changes when `features_digest` changes. +- deterministic scenario metadata records the new `id_algorithm`. +- `findings_diff` or equivalent comparison code refuses to treat v2 and v3 IDs + as directly comparable unless explicit migration support is added. +- ARF-RT/probe-overlay tests either reject cross-version edge IDs or use an + explicit remapping path. +- `constraint_id` behavior is covered by a separate review, including which + fields are case-sensitive and which remain normalized. + +## Non-Goals + +This checkpoint does not: + +- change production ID behavior; +- change `node_id`, `edge_id`, `constraint_id`, or `finding_id` algorithms; +- add live AWS validation; +- run Terraform; +- change reasoner behavior; +- change benchmark semantics; +- add composite scores; +- add pass/fail benchmark labels; +- claim broad IAMScope correctness; +- claim production readiness. + +## Exact Next Slice + +Recommended next slice: add xfail case-sensitive ARN ID collision regression tests.