initrd: add TPM DA lockout gating, counter auth fix, bad_auth tooling and documentation#2117
Draft
tlaurion wants to merge 26 commits into
Draft
initrd: add TPM DA lockout gating, counter auth fix, bad_auth tooling and documentation#2117tlaurion wants to merge 26 commits into
tlaurion wants to merge 26 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves Heads’ TPM error handling for TPM1 by correctly classifying tpmtotp “Defend lock running” output as an authorization-related failure (so it doesn’t immediately hard-fail), and adds a recovery path in tpm1_reset() to attempt clearing TPM1 defend-lock after forceclear by cycling physical presence.
Changes:
- Extend auth-failure grep patterns to include
defend(and unify inclusion of TPM2 auth hex codes in the shared retry helper). - Enhance
tpm1_reset()to detect “defend lock” aftertakeownand retry after cycling physical presence. - Expand TPM documentation to describe tool selection, auth retry detection, and TPM1 defend-lock behavior.
Reviewed changes
Copilot reviewed 1 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| initrd/bin/tpmr.sh | Treat “defend lock” as auth-related and add TPM1 defend-lock recovery logic during reset. |
| doc/tpm.md | Document TPM toolchain selection and the updated auth retry / defend-lock behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
b20be90 to
a5a2ebb
Compare
Contributor
|
The fix did not work…will copy logs. |
Collaborator
Author
Found the bug and where the regression comes from. Damn that one was not easy. Pushing fix and updating the other pr |
…iew fixes
Add preflight dictionary attack (DA) lockout guard to
increment_tpm_counter, querying da_state before every counter
increment. DIE on active lockout, WARN when count nears threshold.
Add tpm1_da_state and tpm2_da_state for unified DA state query:
- TPM1: reads TPM_CAP_DA_LOGIC (0x19); actionDependValue>0 +
state=1 = locked; DA: line timer=field maps to actionDependValue
- TPM2: reads getcap properties-variable (no single DA query);
count >= max_auth = locked; estimate = (counter - max_auth + 1)
* interval; DA: line timer=present when locked, absent when clean
- Both output machine-parsable DA: line for the preflight guard
Fix da_timer sed pattern: use sed -n with /p so da_timer stays
empty when DA: line has no timer= field (TPM2 count < threshold).
Without -n, non-matching lines echoed the full line as da_timer.
Add tpm1_bad_auth and tpm2_bad_auth for testing:
- Uses NV index auth (-P) not owner auth (-C o -P) for TPM2
because NV auth failure produces TPM2_RC_AUTH_FAIL (0x98e) and
increments LOCKOUT_COUNTER; owner auth may not increment on
some TPM2 implementations
- Intentionally wrong password TPM_DEFEND_LOCK_TEST_WRONG_PASSWORD
- Shows DA state before and after; uses || true to survive set -e
Add tpm-reset.sh as a TPM reset frontend via tpmr.sh wrapper.
Add DA documentation in doc/tpm.md covering diagnosis, testing,
escalation, TPM1 vs TPM2 DA parameter configurability.
Review fixes:
- Fix TPM2 property names: TPM_PT_ -> TPM2_PT_ prefix in doc/tpm.md
- Remove misplaced design decision comment from tpm2_da_state
(belongs in tpm2_bad_auth where it already exists)
- Add DEBUG logging at every decision point across preflight guard
and all DA state functions for runtime traceability
- Document design decisions inline: timer logic, estimate formula,
empty auth retry bypass, NV vs owner auth
Signed-off-by: Thierry Laurion <insurgo@riseup.net>
- Fix TPM2 time remaining table entry: estimate is derived from LOCKOUT_COUNTER vs MAX_AUTH_FAIL times LOCKOUT_INTERVAL, not LOCKOUT_RECOVERY (which is the lockout-auth-blocked-after-failure timer, not the remaining-until-unlock) - Reword migration WARN: 'older Heads version' not 'older firmware' (the migration case is caused by previous Heads code, not platform firmware) - Remove fragile PR #2117 reference from preventing-future-lockouts section: describe the fix generically (restoring empty counter auth) so the doc is correct regardless of branch context Signed-off-by: Thierry Laurion <insurgo@riseup.net>
When TPM1 does not support TPM_CAP_DA_LOGIC (0x19), tpm getcapability may print raw TSS error text to stdout instead of returning empty. The empty-string check missed this because error text is non-empty. Fix: check that the output contains the 'State' field (expected for valid DA capability data) before echoing. If missing, return 'unavailable' and suppress the raw TSS garbage. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
TPM_CAP_DA_LOGIC (0x19) was added late in TPM 1.2 spec rev 103. Older Infineon TPMs (X230-era SLB9635/9645) and some Atmel chips do not implement it and return TPM_BAD_MODE (exit 44). Document this limitation in: - tpm1_da_state function comment: specific TPM models affected - increment_tpm_counter preflight guard: note that da_state may return unavailable on older TPM1 - doc/tpm.md: explain why some TPM1 hardware shows 'unavailable' Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…able When TPM_CAP_DA_LOGIC fails, also query TPM_CAP_VERSION_VAL (0x06) and log the TPM firmware version to debug.log. This helps identify which TPM1 chips lack rev 103+ DA capability support. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…iagnostics Log TPM2 firmware version (TPM2_PT_FIRMWARE_VERSION_1, decoded as major.minor) to debug.log in tpm2_da_state. Consistent with the TPM1 version probe already added to tpm1_da_state. This helps identify TPM chips and firmware revisions that may have limited or missing DA capability support. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…o 0x06) TPM_CAP_VERSION_VAL (0x1a) returns the full version info structure but, like TPM_CAP_DA_LOGIC (0x19), may be unsupported on older TPM1 chips. TPM_CAP_VERSION (0x06) returns a basic uint32 encoding and is more widely supported. Try 0x1a first, fall back to 0x06. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
DA: line always emits timer= field (with empty value when unavailable), not conditionally. Fix comment to match actual behavior. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…ry operator expected'
The mount_usb() function used (...)
to group the fallback assignment, creating a subshell.
inside a subshell does not propagate, so USB_FAILED
stayed unset when mount-usb.sh failed, producing the error at line 24.
Fix: use { } brace grouping (no subshell) so USB_FAILED=1
is assigned in the current shell. Also fix the same pattern on
line 27 (second mount attempt).
Signed-off-by: Thierry Laurion <insurgo@riseup.net>
exit 1 inside { } brace grouping actually exits the script
(unlike the original ( ) subshell where it was silently swallowed).
Change to return 1 so aborting USB selection returns from mount_usb()
gracefully instead of killing the entire script.
Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…ailable On X230-era TPM1 chips that lack TPM_CAP_DA_LOGIC, 'TPM DA state: unavailable' was too terse. Add user-facing explanation of why the query failed and how lockout can still be detected. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
… unavailable Extract revMajor/revMinor and VendorID from TPM_CAP_VERSION_VAL (0x1a) to report whether the TPM is pre-rev 103 (too old for CAP_DA_LOGIC) or a newer revision where the capability is still unexpectedly absent. User sees TPM spec revision and vendor alongside 'unavailable'. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
revMajor from TPM_CAP_VERSION_VAL is hex (e.g. 0x0D = 13). Convert to decimal before display and comparison, otherwise bash reads 0D as invalid octal and the -lt comparison silently fails. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…lity version requirements Reference TCG TPM Main Specification Part 2 rev 103 as the revision where TPM_CAP_DA_LOGIC was added. Reference TPM_CAP_VERSION_VAL (0x1a) in tpmtotp's getcapability.c as the source of spec version reporting. Update doc/tpm.md with same references. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
Skip TPM_CAP_DA_LOGIC entirely when TPM_CAP_VERSION_VAL reports revMajor < 103 (pre-rev 103 TPMs don't support it). This avoids the unnecessary TPM_BAD_MODE error and one TPM round trip on older TPMs like the STMicroelectronics rev 13 chip on X230. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…, skip migration fallback When da_state is unavailable (pre-rev 103 TPM1) and the counter increment itself fails, check the output for defend lock patterns before attempting the owner-passphrase migration fallback. On defend lock, DIE immediately with TPM reset guidance instead of confusing the user with an owner passphrase prompt. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…d path The preflight guard must run for both TPM1 and TPM2. TPM1 degrades gracefully (no DA: line → skip), TPM2 always works. Keep comments clear about the pre-rev 103 TPM1 limitation. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
When tpm1_da_state returns unavailable (pre-rev 103 TPM1), bad_auth now explains that the DA counter can't be read and the test proceeds by attempting the increment. User is told to repeat bad_auth until the increment itself fails to confirm lockout. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…n bad_auth guidance User-facing messages now say 'TPM 1.2 spec rev' instead of 'TPM spec rev' to avoid confusion with TPM 2.0. bad_auth guidance text rewritten to describe the limitation clearly without raw parenthetical formatting. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…03 TPM1 Per UX best practices: one line per state, no raw codes, explicit 'expected' for intentional failures. da_state shows one-line status with vendor/rev on second line. bad_auth shows action + outcome without redundant DA state dumps. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…er TPM1
Replace technical output ('TPM DA state: unavailable (TPM 1.2 pre-rev 103)')
with plain language: 'TPM 1.2 too old to report DA lockout state.'
bad_auth now reads: 'testing lockout by attempting increment...'
'Repeat until the increment fails with lockout.'
Vendor/rev info moved to DEBUG log only.
Signed-off-by: Thierry Laurion <insurgo@riseup.net>
…nd pre-rev 103 fallback Document the TPM1 version-first probe (skip CAP_DA_LOGIC if rev < 103), the preflight guard defend lock fallback for pre-rev 103 TPMs, and the firmware version logging in tpm2_da_state. Update tpmr.sh function comments to reflect current behavior and non-developer UX patterns. Signed-off-by: Thierry Laurion <insurgo@riseup.net>
- Only skip DA_LOGIC when revMajor is a valid hex value strictly < 103 - If revMajor is unparseable (format mismatch), try DA_LOGIC anyway - Use BusyBox-compatible basic regex (no \+ ERE operator) - If DA_LOGIC fails despite rev >= 103, show version in error message - Clearer DEBUG logging at each decision point Signed-off-by: Thierry Laurion <insurgo@riseup.net>
- Add da_state and bad_auth to the subcommand reference table - Fix misleading NVRAM index claim (0x3135106223 is not fixed for TPM2) - Fix bad_auth debug.log claim: increment output goes to console - Fix _tpm_auth_retry comment: 'owner auth' -> 'authorization' (handles counter auth too) Signed-off-by: Thierry Laurion <insurgo@riseup.net>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a TPM1 counter auth regression (PR #2068) where
increment_tpm_counterwas changed from hardcoded-pwdc ''(empty counter auth) to-pwdc "${tpm_passphrase:-}"(owner passphrase), while counters continued to be created with-pwdc "". This caused every increment to compute SHA1(owner_pass) against a counter created with SHA1(""), producing persistentTPM_AUTHFAIL.Per TCG TPM Main Spec Part 3,
TPM_CreateCounteruses owner auth (-pwdo) butTPM_IncrementCounteruses the counter's own authData, not the owner password. The correct design for Heads' rollback counter is empty auth.The repeated auth failures (3 per boot) triggered TPM 1.2 dictionary-attack lockout (
TPM_DEFEND_LOCK_RUNNING), which persisted throughforceclearon some implementations.Adds DA state diagnostics, preflight guard and testing tooling.
Changes
initrd/bin/tpmr.sh— auth fix + DA state + bad_authtpm1_counter_increment(): detect-pwdc '', calltpmdirectly (bypass_tpm_auth_retry)tpm1_reset(): detect defend lock, cycle physical presence, retry takeowntpm1_da_state(): TPM1 DA viaTPM_CAP_DA_LOGIC, output DA: linetpm2_da_state(): TPM2 DA viagetcap properties-variable, unlock estimatetpm1_bad_auth()/tpm2_bad_auth(): deliberate wrong-auth for testing'defend'and'0x98e|0x149'to auth detection patternsinitrd/etc/functions.sh— counter auth + DA preflightcheck_tpm_counter():-pwdc ''per TCG specincrement_tpm_counter(): empty auth first, migration fallback with owner passphraseOther
initrd/bin/tpm-reset.sh: TPM reset frontendinitrd/bin/oem-factory-reset.sh:-pwdc ''for consistencydoc/tpm.md: DA diagnosis, testing, escalation, physical presence