Skip to content

docs(power): clarify replay safety and step semantics#5

Open
embano1 wants to merge 2 commits intoaws:mainfrom
embano1:codex/fix-power-replay-semantics-docs
Open

docs(power): clarify replay safety and step semantics#5
embano1 wants to merge 2 commits intoaws:mainfrom
embano1:codex/fix-power-replay-semantics-docs

Conversation

@embano1
Copy link
Copy Markdown

@embano1 embano1 commented Mar 8, 2026

Update the durable-functions power guidance to distinguish deterministic orchestration code from non-atomic durable operation bodies.

  • generalize replay-safety guidance across steps, waits, and concurrent branches
  • document logger replay caveats and note that context.logger can wrap an existing logger
  • correct StepSemantics defaults and examples for TypeScript and Python
  • document the at-most-once-per-retry fallback for non-idempotent steps with retries disabled

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Update the durable-functions power guidance to distinguish deterministic orchestration code from non-atomic durable operation bodies.

- generalize replay-safety guidance across steps, waits, and concurrent branches
- document logger replay caveats and note that context.logger can wrap an existing logger
- correct StepSemantics defaults and examples for TypeScript and Python
- document the at-most-once-per-retry fallback for non-idempotent steps with retries disabled
@embano1 embano1 requested review from bfreiberg and yaythomas March 8, 2026 09:20
@embano1
Copy link
Copy Markdown
Author

embano1 commented Mar 8, 2026

FYI, I also have another branch ready (based on these changes), which reduces the power size by ~50%.

@embano1
Copy link
Copy Markdown
Author

embano1 commented Mar 8, 2026

@singledigit can you use this updated power to verify if it would have caught the bugs in your scanner function: https://github.com/singledigit/durable-function-video-scanner/tree/main

Scanner fix: generate stable Transcribe/Rekognition submission identifiers once from durable state, not from wall-clock time inside the callback submitter. Default: derive them from scanId plus a deterministic suffix, or generate them in an earlier durable step and reuse them in the submitter.
Scanner fix: replace handler-level logger.* calls with context.logger (and childContext.logger / submitter ctx.logger where available) for all logging that can occur during replay.

Comment thread aws-lambda-durable-functions-power/steering/getting-started.md Outdated
@embano1
Copy link
Copy Markdown
Author

embano1 commented Mar 28, 2026

@yaythomas can we prioritize this one to address some issues with the power?


## Rule 2: Durable Operation Bodies Are Not Guaranteed To Be Atomic

**Functions passed to durable context APIs must assume the operation is not guaranteed to be atomic with respect to external side effects, and may be re-attempted before the durable runtime has fully recorded the result.**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. what about at most once guarantee?

  2. what does "durable context APIs" mean? methods on the DurableContext? or the durable handler?

  3. "Functions" means something specific in coding, strictly speaking java doesn't have functions.

  4. Style: avoid passive

Suggestion: Code in durable operation must assume that it could re-run on replay, unless it is in a Step with an AT MOST ONCE execution guarantee. This means that external side-effects caused by such code could execute more than once.


### What This Means

- Non-deterministic computation inside a durable operation body is acceptable because the result can be checkpointed
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not quite why it's "acceptable".

suggestion:
once code inside a durable step completes it saves to a checkpoint, and on subsequent replays the operation returns the saved result. in this way the result of non-deterministic code becomes deterministic on replay because the non-deterministic code does not re-run and the durable execution framework uses the checkpoint result instead.

### What This Means

- Non-deterministic computation inside a durable operation body is acceptable because the result can be checkpointed
- External side effects started from that body should still be safe under re-attempt whenever possible
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-entrancy is strictly speaking the term described in this sentence.

but the general thrust of some of the copy added is for idempotency: same result, no duplicate effects, running it N times has the same effect as running it once

in general, yes, idempotency is a good design pattern to follow here.

however, part of the point of checkpointing is to make provide the idempotency. so this sentence is recommending with "should" to avoid taking advantage of something durable functions provide as a key feature with AT MOST ONCE, which is deterministic checkpointing when wrapping non-idempotent code.

- Non-deterministic computation inside a durable operation body is acceptable because the result can be checkpointed
- External side effects started from that body should still be safe under re-attempt whenever possible
- If the side effect needs an identifier for idempotency, derive it from durable inputs/state or generate it once from durable state and reuse it
- If a **step** cannot be made idempotent and duplicate execution is unacceptable, use `StepSemantics.AtMostOncePerRetry` (TypeScript) or `StepSemantics.AT_MOST_ONCE_PER_RETRY` (Python) with retries disabled so the behavior is effectively zero-or-once rather than more than once
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arguably the step is idempotent once it checkpoints.

the inside of the step isn't.

Also, now that we have Java, we should probably avoid listing per language (TypeSCript vs Python) each time, and instead reference the general concept and refer that to a single source of truth.

1. **Write handler** with durable operations
2. **Test locally** with `LocalDurableTestRunner`
3. **Validate replay rules** (no non-deterministic code outside steps)
3. **Validate replay rules** (determinism outside durable operations; stable identity and idempotent side effects inside durable operation bodies)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about determinism outside of durable operations? the first sentence is negative (i.e don't do this) and the second is positive (do this), but this is not clear from the text.

Is there a replay section this can link to instead?

7. **Choose correct semantics** (AT_LEAST_ONCE vs AT_MOST_ONCE)
7. **Choose correct semantics** (`AtLeastOncePerRetry` vs `AtMostOncePerRetry`)
8. **Use stable identity for external work** - derive identifiers from durable inputs/state, not `Date.now()`, randomness, or fresh UUIDs created inside the step body
9. **Use `AtMostOncePerRetry` with zero retries for non-idempotent steps** when duplicate execution is unacceptable and you can accept zero-or-once behavior
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also repeating text. presumably for a power/agent's use we can keep DRY?

Use AtMostOncePerRetry with retries disabled for steps with
non-idempotent external side effects
the step will execute at most once, and if it fails it will not be retry.


Wait for external systems to respond (human approval, webhook, async job):

The submitter function passed to `waitForCallback(...)` and the check function passed to `waitForCondition(...)` are durable operation bodies. They are not guaranteed to be atomic with respect to external side effects, so if they start or address external work, use stable identity and idempotent behavior. See [replay-model-rules.md](replay-model-rules.md).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, they are specifically a step. "durable operation body" is this new concept introduced in this PR to capture any given durable operation that has a step inside it.

as this stands, it's unclear what it means to be a "durable operation bodies."

similar concerns re the "stable identity & idempotent behaviour" I outlined above

2. **Cannot nest durable operations** - use `runInChildContext` to group operations
3. **Closure mutations are lost on replay** - return values from steps
4. **Side effects outside steps repeat** - use `context.logger` (replay-aware)
1. **All non-deterministic code outside durable operations MUST be moved into durable operations** (`context.step`, `waitForCallback`, `waitForCondition`, `parallel`/`map` branches)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passive voice, prefer active

3. **Closure mutations are lost on replay** - return values from steps
4. **Side effects outside steps repeat** - use `context.logger` (replay-aware)
1. **All non-deterministic code outside durable operations MUST be moved into durable operations** (`context.step`, `waitForCallback`, `waitForCondition`, `parallel`/`map` branches)
2. **Durable operation bodies are not guaranteed to be atomic** - prefer stable identity and idempotent behavior for external side effects; for non-idempotent steps, consider at-most-once-per-retry semantics with zero retries
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this repeats yet again content from before.

it also refers to "durable operation bodies" but then describes semantics that are only available on step.

a step body may run successfully but fail to checkpoint, causing it to re-execute on replay. For steps where duplicate execution is unacceptable, use AtMostOncePerRetry with retries disabled

3. **Closure mutations that won't persist**: Variables mutated inside steps are NOT preserved across replays — return values from steps instead
4. **Side effects outside steps that repeat on replay**: Use `context.logger` for logging (it is replay-aware and deduplicates automatically)
1. **Non-deterministic code outside durable operations**: `Date.now()`, `Math.random()`, UUID generation, API calls, database queries must all be inside durable operations
2. **Non-atomic durable operation bodies**: Functions passed to `context.step()`, `waitForCallback()`, `waitForCondition()`, and `parallel()`/`map()` branches may be re-attempted before persistence is fully committed — prefer stable identity and idempotent external effects; for non-idempotent steps, use at-most-once-per-retry semantics with zero retries when duplicate execution is unacceptable
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this idea is getting repeated a lot. can we consolidate.

If keeping this, maybe something like:

Durable operation bodies are not atomic: Code passed to context.step(),
waitForCallback(), waitForCondition(), and parallel()/map() branches can
succeed but fail to checkpoint, so the runtime may re-execute them on replay.
Prefer idempotent external side effects with stable identity. For steps where
duplicate execution is unacceptable, use AtMostOncePerRetry with retries
disabled.

@embano1
Copy link
Copy Markdown
Author

embano1 commented Apr 27, 2026

Gents, I can address the feedback but before please let me know whether the SKILL is maintained/consolidated elsewhere or if we're keeping this one here as the source of truth.

@yaythomas
Copy link
Copy Markdown
Contributor

regarding feedback, I propose to revise the power to remove detail.

now that we have https://docs.aws.amazon.com/durable-execution/ (with working diagrams ;-) ), the goal is to maximize reasoning density while minimizing token noise. Replicating standard documentation is probably not a great use of context space because LLMs are already trained on public API signatures, and static look-up data is better handled by RAG (Retrieval-Augmented Generation) or live web-search tools.

Idea is to provide a Search Pointer or a link rather than a snippet. This also then takes care of most of the Java changes pending in another PR so we don't have to duplicate/replicate even more content that gets out of date.

Proposed structure:

  1. Rules/Constraints. Gotchas, and mandatory style guides.
  2. Skeleton Master Patterns One or two brief code blocks showing the ideal structure.
  3. Index. Links to repos or specific sections of external docs.

@embano1
Copy link
Copy Markdown
Author

embano1 commented Apr 29, 2026

Agree (and thx for the quick fix on the diagrams :)

As mentioned above, there's lots of opportunity to shrink the Power.

FYI, I also have another branch ready (based on these changes), which reduces the power size by ~50%.

@bfreiberg just checking that if I implement the feedback, we don't break anything downstream for Power consumers (and also how it is consumed in other repos regarding drift and versioning)?

@bfreiberg
Copy link
Copy Markdown
Contributor

Thanks for all your feedback. I've consulted Claude itself with the problem, and I think it comes down to how important the code examples are. Are they supplemental information or do we need them to guarantee that the agent doesn't hallucinate API actions and parameters?

Can links work? Only conditionally. Claude doesn't automatically fetch URLs it sees in a SKILL.md — there's no skill-specific link-following behavior. It'll only pull a URL if WebFetch (or an equivalent network tool) is available and it decides to fetch. That means links fail silently on the Claude API (no network), on offline Claude Code, and anytime WebFetch is disabled. Private repos add auth problems on top.

Is it preferable? No, for a few concrete reasons:

  • Reliability: bundled files always work; links break on 404s, repo moves, auth, or no network.
  • Token cost: bundled files sit on disk at zero context cost until Claude reads them. A fetched URL dumps the whole response into context.
  • Progressive disclosure: the skills model is built around Claude using bash to read local files on demand — that's the intended shape.

Idiomatic pattern: official Anthropic skills (pptx, xlsx, pdf, claude-api) bundle everything. For code examples, put them in an examples/ or references/ subdirectory of the skill and reference them with relative paths from SKILL.md. Claude reads them only when needed.

When linking does make sense: supplementary pointers to living docs the user controls (internal wiki, a dashboard, an API reference that changes often) — and only as "go look here for more," not as the primary source the skill depends on. For pinned code examples, always bundle.

@embano1
Copy link
Copy Markdown
Author

embano1 commented Apr 29, 2026

Thx, which raises another question: should we migrate our Power to a SKILL which can then be consumed by Kiro Powers (like the Serverless Plugin) and non-AWS coding agents (like Codex and Claude)?

@yaythomas
Copy link
Copy Markdown
Contributor

yaythomas commented Apr 29, 2026

In general, per the review comments I raised ibid: at least we should stop repeating the same replay rules in every steering file. State them once, reference them.

Funnily enough, I consulted two different GenAIs with the problem before I made my post, and neither preferred "always bundle", and both steered towards what I posted to begin with (a core reference, link out the rest).

I've since asked Mighty Claude itself to revisit given @bfreiberg's Claude's response... I'll just share the full agent output on that.

Based on the discussion and how skills/powers actually work, here's how I'd structure it:

Tier 1: Bundle in the SKILL.md itself (~2-3KB)

What LLMs get wrong and can't look up — this is the irreplaceable content:

  • The 4 critical replay rules (non-deterministic code in steps, no nesting, closure mutations lost, side effects repeat)
  • Step semantics (AtLeastOnce vs AtMostOnce)
  • The qualified ARN invocation requirement (non-obvious, breaks silently)
  • IAM policy name (AWSLambdaBasicDurableExecutionRolePolicy)
  • context.logger for replay-safe logging
  • SDK package names for all 3 languages (install commands)

Tier 2: Bundle in references/ subdirectory, loaded on demand (~15-20KB total)

Skeleton patterns that prevent hallucination

  • One canonical handler pattern per language (TS, Python, Java) — ~3 files
  • One canonical test pattern per language — ~3 files
  • One deployment/IaC reference (SAM + CDK snippets) — 1 file
  • Error handling patterns (saga, retry config) — 1 file

This is where the current 114KB can shrink to ~15-20KB.

Tier 3: Links as supplementary pointers (in SKILL.md footer)

Living content that changes often:

On the Power vs Skill question

@embano1 is right to raise this. If you want portability across Kiro, Claude
Code, and Codex:

  • SKILL.md format is the portable choice — it works in Claude Code natively
    and can be consumed by Kiro powers
  • A Kiro Power can wrap a skill (the Power becomes a thin shell that loads the
    skill)
  • This also solves the "maintained where?" question — the skill is the source
    of truth, the power is a distribution mechanism

What to cut from the current power

Looking at the steering files, the biggest wins:

  • deployment-iac.md (14KB) — most of this is CloudFormation/CDK boilerplate
    that's better served by the official docs. Keep only the gotchas (qualified
    ARN, IAM policy).
  • testing-patterns.md (18KB) — keep one skeleton per language, cut the rest.
    The testing docs site is comprehensive.
  • getting-started.md (11KB) — the onboarding wizard in POWER.md already covers
    this. Redundant.
  • Deduplicate replay rules — yaythomas's review comments repeatedly flag that
    the same replay/determinism rules are copy-pasted across step-operations,
    wait-operations, concurrent-operations, and getting-started. State once,
    reference everywhere.

That alone gets you close to the ~50% reduction @embano1 mentioned, probably
more.

TL;DR for the PR

The team should converge on: bundle the rules and skeletons that prevent hallucination, link to everything else, and migrate to SKILL.md format for portability. The current power has too much duplicated content across steering files and too much reference material that's better served by the official docs site.

@yaythomas
Copy link
Copy Markdown
Contributor

this is getting long for a PR, I've started a discussion thread regarding this here: #156

Looking forward to seeing you both (and any other interested parties) there! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants