Skip to content

initrd: add TPM DA lockout gating, counter auth fix, bad_auth tooling and documentation#2117

Draft
tlaurion wants to merge 26 commits into
masterfrom
tpm1_fixes
Draft

initrd: add TPM DA lockout gating, counter auth fix, bad_auth tooling and documentation#2117
tlaurion wants to merge 26 commits into
masterfrom
tpm1_fixes

Conversation

@tlaurion
Copy link
Copy Markdown
Collaborator

@tlaurion tlaurion commented May 14, 2026

Summary

Fixes a TPM1 counter auth regression (PR #2068) where increment_tpm_counter was changed from hardcoded -pwdc '' (empty counter auth) to -pwdc "${tpm_passphrase:-}" (owner passphrase), while counters continued to be created with -pwdc "". This caused every increment to compute SHA1(owner_pass) against a counter created with SHA1(""), producing persistent TPM_AUTHFAIL.

Per TCG TPM Main Spec Part 3, TPM_CreateCounter uses owner auth (-pwdo) but TPM_IncrementCounter uses the counter's own authData, not the owner password. The correct design for Heads' rollback counter is empty auth.

The repeated auth failures (3 per boot) triggered TPM 1.2 dictionary-attack lockout (TPM_DEFEND_LOCK_RUNNING), which persisted through forceclear on some implementations.

Adds DA state diagnostics, preflight guard and testing tooling.

Changes

initrd/bin/tpmr.sh — auth fix + DA state + bad_auth

  • tpm1_counter_increment(): detect -pwdc '', call tpm directly (bypass _tpm_auth_retry)
  • tpm1_reset(): detect defend lock, cycle physical presence, retry takeown
  • tpm1_da_state(): TPM1 DA via TPM_CAP_DA_LOGIC, output DA: line
  • tpm2_da_state(): TPM2 DA via getcap properties-variable, unlock estimate
  • tpm1_bad_auth() / tpm2_bad_auth(): deliberate wrong-auth for testing
  • Add 'defend' and '0x98e|0x149' to auth detection patterns

initrd/etc/functions.sh — counter auth + DA preflight

Other

  • initrd/bin/tpm-reset.sh: TPM reset frontend
  • initrd/bin/oem-factory-reset.sh: -pwdc '' for consistency
  • doc/tpm.md: DA diagnosis, testing, escalation, physical presence

Copilot AI review requested due to automatic review settings May 14, 2026 19:09
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves Heads’ TPM error handling for TPM1 by correctly classifying tpmtotp “Defend lock running” output as an authorization-related failure (so it doesn’t immediately hard-fail), and adds a recovery path in tpm1_reset() to attempt clearing TPM1 defend-lock after forceclear by cycling physical presence.

Changes:

  • Extend auth-failure grep patterns to include defend (and unify inclusion of TPM2 auth hex codes in the shared retry helper).
  • Enhance tpm1_reset() to detect “defend lock” after takeown and retry after cycling physical presence.
  • Expand TPM documentation to describe tool selection, auth retry detection, and TPM1 defend-lock behavior.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 3 comments.

File Description
initrd/bin/tpmr.sh Treat “defend lock” as auth-related and add TPM1 defend-lock recovery logic during reset.
doc/tpm.md Document TPM toolchain selection and the updated auth retry / defend-lock behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread doc/tpm.md Outdated
Comment thread doc/tpm.md Outdated
Comment thread doc/tpm.md
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 2 changed files in this pull request and generated 2 comments.

Comment thread doc/tpm.md Outdated
Comment thread doc/tpm.md Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 2 changed files in this pull request and generated 1 comment.

Comment thread doc/tpm.md Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 2 changed files in this pull request and generated no new comments.

@notgivenby
Copy link
Copy Markdown
Contributor

The fix did not work…will copy logs.

@tlaurion
Copy link
Copy Markdown
Collaborator Author

The fix did not work…will copy logs.

Found the bug and where the regression comes from. Damn that one was not easy. Pushing fix and updating the other pr

@tlaurion tlaurion requested a review from Copilot May 16, 2026 01:08
@tlaurion tlaurion changed the title initrd/bin/tpmr.sh: fix TPM1 auth failure detection and defend lock recovery initrd: fix TPM1 counter auth regression and defend lock cascade failure May 16, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.

Comment thread initrd/etc/functions.sh
…iew fixes

Add preflight dictionary attack (DA) lockout guard to
increment_tpm_counter, querying da_state before every counter
increment. DIE on active lockout, WARN when count nears threshold.

Add tpm1_da_state and tpm2_da_state for unified DA state query:
  - TPM1: reads TPM_CAP_DA_LOGIC (0x19); actionDependValue>0 +
    state=1 = locked; DA: line timer=field maps to actionDependValue
  - TPM2: reads getcap properties-variable (no single DA query);
    count >= max_auth = locked; estimate = (counter - max_auth + 1)
    * interval; DA: line timer=present when locked, absent when clean
  - Both output machine-parsable DA: line for the preflight guard

Fix da_timer sed pattern: use sed -n with /p so da_timer stays
empty when DA: line has no timer= field (TPM2 count < threshold).
Without -n, non-matching lines echoed the full line as da_timer.

Add tpm1_bad_auth and tpm2_bad_auth for testing:
  - Uses NV index auth (-P) not owner auth (-C o -P) for TPM2
    because NV auth failure produces TPM2_RC_AUTH_FAIL (0x98e) and
    increments LOCKOUT_COUNTER; owner auth may not increment on
    some TPM2 implementations
  - Intentionally wrong password TPM_DEFEND_LOCK_TEST_WRONG_PASSWORD
  - Shows DA state before and after; uses || true to survive set -e

Add tpm-reset.sh as a TPM reset frontend via tpmr.sh wrapper.

Add DA documentation in doc/tpm.md covering diagnosis, testing,
escalation, TPM1 vs TPM2 DA parameter configurability.

Review fixes:
  - Fix TPM2 property names: TPM_PT_ -> TPM2_PT_ prefix in doc/tpm.md
  - Remove misplaced design decision comment from tpm2_da_state
    (belongs in tpm2_bad_auth where it already exists)
  - Add DEBUG logging at every decision point across preflight guard
    and all DA state functions for runtime traceability
  - Document design decisions inline: timer logic, estimate formula,
    empty auth retry bypass, NV vs owner auth

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 5 changed files in this pull request and generated 3 comments.

Comment thread initrd/etc/functions.sh Outdated
Comment thread doc/tpm.md Outdated
Comment thread doc/tpm.md Outdated
- Fix TPM2 time remaining table entry: estimate is derived from
  LOCKOUT_COUNTER vs MAX_AUTH_FAIL times LOCKOUT_INTERVAL, not
  LOCKOUT_RECOVERY (which is the lockout-auth-blocked-after-failure
  timer, not the remaining-until-unlock)
- Reword migration WARN: 'older Heads version' not 'older firmware'
  (the migration case is caused by previous Heads code, not platform
  firmware)
- Remove fragile PR #2117 reference from preventing-future-lockouts
  section: describe the fix generically (restoring empty counter auth)
  so the doc is correct regardless of branch context

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
@tlaurion tlaurion changed the title initrd: fix TPM1 counter auth regression and defend lock cascade failure initrd: add TPM DA lockout gating, counter auth fix, bad_auth tooling and documentation May 17, 2026
When TPM1 does not support TPM_CAP_DA_LOGIC (0x19), tpm getcapability
may print raw TSS error text to stdout instead of returning empty.
The empty-string check missed this because error text is non-empty.

Fix: check that the output contains the 'State' field (expected for
valid DA capability data) before echoing. If missing, return
'unavailable' and suppress the raw TSS garbage.

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
@tlaurion tlaurion marked this pull request as draft May 17, 2026 18:35
tlaurion added 22 commits May 17, 2026 14:36
TPM_CAP_DA_LOGIC (0x19) was added late in TPM 1.2 spec rev 103.
Older Infineon TPMs (X230-era SLB9635/9645) and some Atmel chips
do not implement it and return TPM_BAD_MODE (exit 44).

Document this limitation in:
- tpm1_da_state function comment: specific TPM models affected
- increment_tpm_counter preflight guard: note that da_state may
  return unavailable on older TPM1
- doc/tpm.md: explain why some TPM1 hardware shows 'unavailable'

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…able

When TPM_CAP_DA_LOGIC fails, also query TPM_CAP_VERSION_VAL (0x06) and
log the TPM firmware version to debug.log. This helps identify which
TPM1 chips lack rev 103+ DA capability support.

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…iagnostics

Log TPM2 firmware version (TPM2_PT_FIRMWARE_VERSION_1, decoded as
major.minor) to debug.log in tpm2_da_state.  Consistent with the
TPM1 version probe already added to tpm1_da_state.

This helps identify TPM chips and firmware revisions that may have
limited or missing DA capability support.

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…o 0x06)

TPM_CAP_VERSION_VAL (0x1a) returns the full version info structure
but, like TPM_CAP_DA_LOGIC (0x19), may be unsupported on older TPM1
chips.  TPM_CAP_VERSION (0x06) returns a basic uint32 encoding and
is more widely supported.  Try 0x1a first, fall back to 0x06.

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
DA: line always emits timer= field (with empty value when unavailable),
not conditionally.  Fix comment to match actual behavior.

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…ry operator expected'

The mount_usb() function used (...)
      to group the fallback assignment, creating a subshell.
      inside a subshell does not propagate, so USB_FAILED
stayed unset when mount-usb.sh failed, producing the error at line 24.

Fix: use { } brace grouping (no subshell) so USB_FAILED=1
is assigned in the current shell.  Also fix the same pattern on
line 27 (second mount attempt).

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
exit 1 inside { } brace grouping actually exits the script
(unlike the original ( ) subshell where it was silently swallowed).
Change to return 1 so aborting USB selection returns from mount_usb()
gracefully instead of killing the entire script.

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…ailable

On X230-era TPM1 chips that lack TPM_CAP_DA_LOGIC, 'TPM DA state:
unavailable' was too terse.  Add user-facing explanation of why
the query failed and how lockout can still be detected.

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
… unavailable

Extract revMajor/revMinor and VendorID from TPM_CAP_VERSION_VAL
(0x1a) to report whether the TPM is pre-rev 103 (too old for
CAP_DA_LOGIC) or a newer revision where the capability is still
unexpectedly absent.  User sees TPM spec revision and vendor
alongside 'unavailable'.

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
revMajor from TPM_CAP_VERSION_VAL is hex (e.g. 0x0D = 13).
Convert to decimal before display and comparison, otherwise bash
reads 0D as invalid octal and the -lt comparison silently fails.

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…lity version requirements

Reference TCG TPM Main Specification Part 2 rev 103 as the revision
where TPM_CAP_DA_LOGIC was added.  Reference TPM_CAP_VERSION_VAL
(0x1a) in tpmtotp's getcapability.c as the source of spec version
reporting.  Update doc/tpm.md with same references.

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
Skip TPM_CAP_DA_LOGIC entirely when TPM_CAP_VERSION_VAL reports
revMajor < 103 (pre-rev 103 TPMs don't support it).  This avoids
the unnecessary TPM_BAD_MODE error and one TPM round trip on older
TPMs like the STMicroelectronics rev 13 chip on X230.

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…, skip migration fallback

When da_state is unavailable (pre-rev 103 TPM1) and the counter
increment itself fails, check the output for defend lock patterns
before attempting the owner-passphrase migration fallback.  On
defend lock, DIE immediately with TPM reset guidance instead of
confusing the user with an owner passphrase prompt.

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…d path

The preflight guard must run for both TPM1 and TPM2.  TPM1 degrades
gracefully (no DA: line → skip), TPM2 always works.  Keep comments
clear about the pre-rev 103 TPM1 limitation.

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
When tpm1_da_state returns unavailable (pre-rev 103 TPM1), bad_auth
now explains that the DA counter can't be read and the test proceeds
by attempting the increment.  User is told to repeat bad_auth until
the increment itself fails to confirm lockout.

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…n bad_auth guidance

User-facing messages now say 'TPM 1.2 spec rev' instead of 'TPM spec rev'
to avoid confusion with TPM 2.0.  bad_auth guidance text rewritten to
describe the limitation clearly without raw parenthetical formatting.

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…03 TPM1

Per UX best practices: one line per state, no raw codes, explicit
'expected' for intentional failures.  da_state shows one-line status
with vendor/rev on second line.  bad_auth shows action + outcome
without redundant DA state dumps.

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…er TPM1

Replace technical output ('TPM DA state: unavailable (TPM 1.2 pre-rev 103)')
with plain language: 'TPM 1.2 too old to report DA lockout state.'
bad_auth now reads: 'testing lockout by attempting increment...'
'Repeat until the increment fails with lockout.'
Vendor/rev info moved to DEBUG log only.

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…nd pre-rev 103 fallback

Document the TPM1 version-first probe (skip CAP_DA_LOGIC if rev < 103),
the preflight guard defend lock fallback for pre-rev 103 TPMs, and the
firmware version logging in tpm2_da_state.  Update tpmr.sh function
comments to reflect current behavior and non-developer UX patterns.

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
- Only skip DA_LOGIC when revMajor is a valid hex value strictly < 103
- If revMajor is unparseable (format mismatch), try DA_LOGIC anyway
- Use BusyBox-compatible basic regex (no \+ ERE operator)
- If DA_LOGIC fails despite rev >= 103, show version in error message
- Clearer DEBUG logging at each decision point

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
- Add da_state and bad_auth to the subcommand reference table
- Fix misleading NVRAM index claim (0x3135106223 is not fixed for TPM2)
- Fix bad_auth debug.log claim: increment output goes to console
- Fix _tpm_auth_retry comment: 'owner auth' -> 'authorization' (handles counter auth too)

Signed-off-by: Thierry Laurion <insurgo@riseup.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants