Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
b20e4b7
initrd: fix TPM1 counter auth regression and defend lock cascade failure
tlaurion May 16, 2026
f7376c2
initrd: add TPM DA lockout gating, bad_auth test helpers, doc and rev…
tlaurion May 17, 2026
30a4201
address review comments on PR #2117
tlaurion May 17, 2026
2149508
initrd/bin/tpmr.sh: validate tpm1_da_state output before printing
tlaurion May 17, 2026
fc3ec63
doc: document TPM1 DA state query hardware limitations
tlaurion May 17, 2026
6dcbdb4
initrd/bin/tpmr.sh: probe TPM1 firmware version when DA state unavail…
tlaurion May 18, 2026
28f60a8
initrd/bin/tpmr.sh: probe TPM firmware version in tpm2_da_state for d…
tlaurion May 18, 2026
7119d6f
initrd/bin/tpmr.sh: fix TPM1 version probe (use cap 0x1a, fall back t…
tlaurion May 18, 2026
38c3112
initrd/bin/tpmr.sh: fix timer= comment in tpm1_da_state
tlaurion May 18, 2026
edb75d1
initrd/etc/gui_functions.sh: fix USB_FAILED subshell bug causing 'una…
tlaurion May 18, 2026
8dacb57
initrd/etc/gui_functions.sh: use return 1 instead of exit 1 on USB abort
tlaurion May 19, 2026
d688ae4
initrd/bin/tpmr.sh: improve tpm1_da_state feedback when DA state unav…
tlaurion May 19, 2026
c9304a4
initrd/bin/tpmr.sh: report TPM spec revision and vendor when DA state…
tlaurion May 19, 2026
862e942
initrd/bin/tpmr.sh: fix hex-to-decimal conversion in DA state feedback
tlaurion May 19, 2026
0147ce7
initrd/bin/tpmr.sh, doc/tpm.md: add TCG spec references for DA capabi…
tlaurion May 19, 2026
b92e18d
initrd/bin/tpmr.sh: check TPM spec version before querying DA state
tlaurion May 19, 2026
61a51a1
initrd/etc/functions.sh: detect TPM1 defend lock on increment failure…
tlaurion May 19, 2026
a2ab1c3
initrd/etc/functions.sh: clean up DA guard comment, restore TPM2 guar…
tlaurion May 19, 2026
0dc11a6
initrd/bin/tpmr.sh: add DA-unavailable guidance to tpm1_bad_auth
tlaurion May 19, 2026
9ecb864
initrd/bin/tpmr.sh: label TPM 1.2 explicitly in DA state output, clea…
tlaurion May 19, 2026
a3f16cb
initrd/bin/tpmr.sh: concise UX for da_state and bad_auth on pre-rev 1…
tlaurion May 19, 2026
5a6bb88
initrd/bin/tpmr.sh: normalize TPM vendor/rev output format
tlaurion May 19, 2026
8696c49
initrd/bin/tpmr.sh: non-developer UX for da_state and bad_auth on old…
tlaurion May 19, 2026
14f09eb
doc: update inline docs and tpm.md for DA state version-first probe a…
tlaurion May 19, 2026
6988bf3
initrd/bin/tpmr.sh: strengthen version-first logic in tpm1_da_state
tlaurion May 19, 2026
f0926e9
doc: fix code/doc mismatches found in consistency review
tlaurion May 19, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
254 changes: 249 additions & 5 deletions doc/tpm.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,35 @@ See also: [architecture.md](architecture.md), [boot-process.md](boot-process.md)
## tpmr — unified TPM abstraction

`initrd/bin/tpmr.sh` is a shell script wrapper that presents a single interface
over both TPM 1.2 (`tpm` / `trousers`) and TPM 2.0 (`tpm2-tools`). All Heads
scripts call `tpmr.sh` rather than invoking `tpm` or `tpm2` directly.
over both TPM 1.2 and TPM 2.0. All Heads scripts call `tpmr.sh` rather than
invoking TPM tools directly.

### Boot chain and TPM tool selection

```text
initrd/init (PID 1)
└─ CONFIG_BOOTSCRIPT → /bin/gui-init.sh [board config]
├─ source /etc/functions.sh [shared TPM helpers]
├─ source /etc/gui_functions.sh [whiptail wrappers]
└─ calls initrd/bin/tpmr.sh [TPM abstraction]
├─ TPM1: calls `tpm` (tpmtotp util/tpm) [CONFIG_TPM2_TOOLS != y]
│ modules/tpmtotp → output: totp hotp qrenc util/tpm
└─ TPM2: calls `tpm2` (single binary, subcommands) [CONFIG_TPM2_TOOLS=y]
modules/tpm2-tss + modules/tpm2-tools
```

TPM1 support comes exclusively from the `tpmtotp` module (`modules/tpmtotp`),
which builds `util/tpm` as part of its outputs. This binary is installed to
the initrd as `tpm` and supports subcommands such as `physicalpresence`,
`forceclear`, `takeown -pwdo`, `counter_create`, `counter_increment`, etc.

TPM2 support comes from `modules/tpm2-tss` (TSS software stack) and
`modules/tpm2-tools` (`tpm2` binary with subcommands like `getcap`,
`nvdefine`, `nvincrement`).

Both TPM1 and TPM2 boards may also enable `CONFIG_TPMTOTP=y` for the
`totp` and `hotp` utilities, which are independent of the TPM version.
Comment thread
tlaurion marked this conversation as resolved.

### PCR sizes

Expand All @@ -38,6 +65,8 @@ scripts call `tpmr.sh` rather than invoking `tpm` or `tpm2` directly.
| `reset` | Reset the TPM |
| `kexec_finalize` | Finalize PCR state before kexec (TPM2 only) |
| `shutdown` | Orderly shutdown (TPM2 only) |
| `da_state` | Query dictionary attack lockout state |
| `bad_auth` | Deliberately trigger an auth failure to test DA lockout |

---

Expand Down Expand Up @@ -271,9 +300,11 @@ The rollback counter prevents **TPM swap attacks** and **/boot disk swap attacks

### How it works

The counter is stored **in the TPM** (NVRAM index `0x3135106223`), ensuring
hardware binding. A SHA-256 hash of the counter value is stored on **/boot**
(`/boot/kexec_rollback.txt`). This creates a two-way binding:
The counter value is stored **in the TPM** at a persistent NVRAM index
(stored in `/boot/kexec_rollback.txt`; the index is a well-known value
for TPM1 and randomly generated per TPM2 at provisioning time). A SHA-256
hash of the counter value is stored on **/boot** (`/boot/kexec_rollback.txt`).
This creates a two-way binding:

- Cannot swap TPM without breaking /boot consistency
- Cannot swap /boot without breaking TPM consistency
Expand Down Expand Up @@ -398,3 +429,216 @@ To verify that a new board's coreboot config matches the expected RoT:
| Auth sessions | Not used | Required for policy-based unseal |
| `kexec_finalize` | No-op | Extends PCRs, then `tpm2 shutdown` |
| `startsession` | No-op | Creates encryption session |

### TPM1 auth retry and error detection

`_tpm_auth_retry()` in `initrd/bin/tpmr.sh` provides shared retry logic for
both TPM1 and TPM2 operations that need authorization. On auth failure
(wrong passphrase), the passphrase cache is shredded and the user is
re-prompted up to 3 times before giving up.

Auth failure is detected by grepping the command output for known error
patterns. TPM1 (tpmtotp) errors go to stdout via `printf()` with
`TPM_GetErrMsg()` strings. TPM2 (tpm2-tools) errors go to stderr via
`LOG_ERR()` and may include raw TPM response codes.

| Pattern | Type | TPM version | Example error |
| --- | --- | --- | --- |
| `authorization|auth|bad|permission` | English words | TPM1+TPM2 | `TPM_AUTHFAIL`, `bad passphrase` |
| `defend` | English word | TPM1 | `Defend lock running` |
| `0x98e|0x149` | Hex codes | TPM2 | `TPM2_RC_AUTH_FAIL`, `TPM2_RC_NV_AUTHORIZATION` |

### TPM1 reset defend lock

`TPM_DEFEND_LOCK_RUNNING` (`tpm_error.h`: `TPM_BASE + TPM_NON_FATAL + 3`)
is a standard TPM 1.2 error raised when the TPM's dictionary-attack
protection is active. After too many failed authorization attempts, the
TPM enters a time-out period and refuses all authorization operations --
including `tpm takeown` even after a successful `tpm forceclear`
(forceclear clears the owner but not the dictionary attack counter on
some implementations, particularly Infineon TPMs).

tpmtotp's `tpm takeown` outputs:
```
Error Defend lock running from TPM_TakeOwnership
```

`tpm1_reset()` in `initrd/bin/tpmr.sh` detects "defend lock" in the
`takeown` output and attempts one recovery: cycling physical presence
(`physicaldisable` / `physicalenable` / `physicalpresence` /
`physicalsetdeactivated`) to re-assert PP before retrying `takeown`.
This works on some chipsets where software presence was not properly
honoured by the first `forceclear`.

If PP cycling also fails, no software-based recovery is available.
Further attempts (second forceclear, `TPM_ResetLockValue` with empty
auth, sleep+retry) will not help. Use `tpmr.sh da_state` from the
recovery shell to check the current DA state:

- **TPM1**: `actionDependValue` reports remaining lockout seconds.
- **TPM2**: the human-readable summary shows estimated unlock time
based on `recoveryTime` (seconds before one failure is forgotten).

Alternatively, reset the TPM to clear the DA state entirely:
`tpm-reset.sh` from the recovery shell, or GUI menu `Options ->
TPM/TOTP/HOTP Options -> Reset the TPM` for full reprovision.

#### DA lockout duration escalation

TPM 1.2 dictionary attack timeouts escalate with the failure count
(approximate; varies by vendor and TPM firmware version per Dell and
Microsoft documentation):

| Failures accumulated | Typical lockout time |
|---------------------|---------------------|
| 1-2 | None (counter only) |
| 3-5 | 10 seconds |
| 6-9 | 1 hour |
| 10-12 | Several hours |
| 13+ | Up to 24 hours |

Each time the TPM fully locks out and the timer expires, the DA counter
resets. If failures continue to accumulate across boots without
waiting for the timer to expire, the escalation can reach 24 hours.
This is what happened with the counter auth regression (3 failures per
boot x many boots): the DA counter reached the maximum threshold.

#### Diagnosing DA state

Use `tpmr.sh da_state` from the recovery shell to query the current DA
state. Available for both TPM1 and TPM2:

| Information | TPM1 | TPM2 |
|-------------|------|------|
| Locked? | `state`: 0=inactive, 1=locked | `TPM2_PT_LOCKOUT_COUNTER` > `TPM2_PT_MAX_AUTH_FAIL` |
| Current failures | `currentCount` | `TPM2_PT_LOCKOUT_COUNTER` |
| Lockout threshold | `thresholdCount` | `TPM2_PT_MAX_AUTH_FAIL` |
| Lockout interval | -- | `TPM2_PT_LOCKOUT_INTERVAL` |
| Time remaining | `actionDependValue` (seconds) | Estimate from `LOCKOUT_COUNTER` vs `MAX_AUTH_FAIL` times `LOCKOUT_INTERVAL` |

The recovery shell can run `tpmr.sh da_state` at any time to check
whether the TPM is locked and how much lockout time remains.

##### TPM1 version check

`tpm1_da_state` first queries the TPM spec version via
`TPM_CAP_VERSION_VAL` (0x1a). If `revMajor < 103`, the TPM predates
the `TPM_CAP_DA_LOGIC` capability (added in TCG TPM Main Part 2 rev 103)
and the function returns immediately without attempting the DA query:

```
TPM 1.2 too old to report DA lockout state.
```

The TPM vendor and spec revision are logged to debug.log for diagnostics.

TPM1 chips known to lack DA state query support:
- STMicroelectronics ST33TP series (rev 13, confirmed on ThinkPad X230)
- Older Infineon SLB9635/9645 (pre-rev 103 firmware)
- Some Atmel/Microchip TPMs

On such hardware, the preflight guard in `increment_tpm_counter` cannot
detect lockout before the increment. If the TPM is locked, the increment
fails with `TPM_DEFEND_LOCK_RUNNING`, which is caught by the error
handling (see below). TPM2 is unaffected.

##### TPM2 firmware version

`tpm2_da_state` logs `TPM2_PT_FIRMWARE_VERSION_1` to debug.log when DA
properties are successfully queried. This helps identify the TPM chip
and firmware revision for diagnostic purposes.

#### DA parameter configurability

TPM2 DA parameters are configured during `tpm2_reset()` (called by
`tpm-reset.sh` and the GUI `reset_tpm()`). Heads sets:
- `maxTries=10`: auth failures before lockout
- `recoveryTime=3600`: seconds before one failure is forgotten (counter
decrements by 1 per interval)
- `lockoutRecovery=0`: seconds lockout auth blocked after failure

TPM1 has no software-accessible command to configure DA parameters
(tpmtotp's `setcapability` does not expose DA threshold or timeout
sub-capabilities). The DA policy is determined by the TPM firmware
and cannot be changed through software on TPM1.

#### Testing DA lockout

Use `tpmr.sh bad_auth` from the recovery shell to test dictionary attack
lockout behavior by deliberately triggering an auth failure:

- **TPM1**: attempts `tpm counter_increment -pwdc <wrong>` with the counter
ID from `/boot/kexec_rollback.txt`. Each call increments the DA counter
by 1 until lockout is triggered. On pre-rev 103 TPMs, DA state can't be
read before/after — the test reports whether the increment succeeded (no
lockout) or failed (lockout active). Repeat until the increment fails.
- **TPM2**: attempts `tpm2 nvincrement -P <wrong>` with NV index auth.
Uses `-P` (not `-C o -P`) because NV index auth failure produces
`TPM2_RC_AUTH_FAIL` (0x98e) and does increment `LOCKOUT_COUNTER`;
owner auth may not increment on some implementations. Shows DA state
before and after each attempt.

Test headers and DA state queries are logged to debug.log for analysis;
the increment command output goes to the console during interactive use.

#### Preventing future lockouts

Heads' counter auth regression caused 3 TPM auth failures per boot by
passing the owner passphrase as the counter auth while the counter was
created with empty auth. Restoring empty counter auth for both creation
and increment (as per TCG spec) prevents auth failures from counter
operations. All TPM1 boards that ran the regression code are affected
identically; this is not platform-specific.

If lockout still occurs (e.g., from deliberate `bad_auth` testing or
other bugs), the increment failure path in `increment_tpm_counter`
detects it: on TPM1, the captured increment output is grepped for
`defend`/`lock`/`0x19` patterns, and the user is directed to reset
the TPM via the GUI menu. This is the catch-all for pre-rev 103 TPMs
where `da_state` can't report lockout state.

### TPM1 physical presence

TPM1.2 forceclear requires physical presence to be asserted. The
`tpm1_reset()` function does this with `tpm physicalpresence -s` (software
presence). On some platforms (e.g., Dell OptiPlex, some Infineon TPMs),
software physical presence may not work — the TPM firmware only accepts
hardware-asserted presence (GPIO set by BIOS). In that case, `forceclear`
returns success but may not fully reset the TPM, or `takeown` may fail
with unexpected errors.

When software physical presence fails, the LOG shows:
```
tpm1_reset: unable to set physical presence
```

This is logged but not fatal — `tpm forceclear` is still attempted.
If the TPM firmware ignores software physical presence, the reset fails
and the user must use the platform's hardware TPM reset mechanism
(typically a BIOS option or jumper).

### TPM reset methods

Heads has two TPM reset methods with different scope:

**`tpm-reset.sh`** (CLI, recovery shell):
- Prompts for new owner passphrase, calls `tpmr.sh reset`
- TPM clear + re-ownership only
- No counter creation, no /boot signing, no TOTP/HOTP generation
- Intended for headless recovery or clearing a defend lock before running
the full GUI flow

**`reset_tpm()`** (GUI, via Options -> TPM/TOTP/HOTP -> Reset the TPM in
`initrd/bin/gui-init.sh`):
- Prompts for new owner passphrase, calls `tpmr.sh reset`
- Removes stale `/boot/kexec_rollback.txt` and `/boot/kexec_primhdl_hash.txt`
- Creates new TPM rollback counter via `check_tpm_counter()`
- Increments the new counter
- Re-signs /boot with the GPG signing key
- Generates new TOTP/HOTP secrets
- Reseals TPM Disk Unlock Key (DUK) to LUKS
- Regenerates TPM2 encrypted sessions

After `tpm-reset.sh`, the TPM is cleared but the system is not fully
provisioned — the user must complete the GUI `reset_tpm()` or OEM Factory
Reset to restore counter, signing, and secrets.
2 changes: 1 addition & 1 deletion initrd/bin/oem-factory-reset.sh
Original file line number Diff line number Diff line change
Expand Up @@ -868,7 +868,7 @@ generate_checksums() {
if [ "$CONFIG_TPM" = "y" ]; then
if [ "$CONFIG_IGNORE_ROLLBACK" != "y" ]; then
tpmr.sh counter_create \
-pwdc "${TPM_PASS:-}" \
-pwdc '' \
-la -3135106223 |
tee /tmp/counter >/dev/null 2>&1 ||
whiptail_error_die "Unable to create TPM counter"
Expand Down
9 changes: 9 additions & 0 deletions initrd/bin/tpm-reset.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,12 @@ NOTE "This will erase all keys and secrets from the TPM"
prompt_new_owner_password

tpmr.sh reset "$tpm_owner_passphrase"

# TODO: move the TPM reset + full reprovision flow (counter creation, /boot
# signing, TOTP/HOTP generation, DUK reseal) from gui-init.sh's reset_tpm()
# into a reusable function in functions.sh. Then tpm-reset.sh and the GUI
# reset_tpm() can both call the same code, eliminating the inconsistency
# between CLI and GUI reset paths.

NOTE "TPM cleared. The TPM rollback counter was destroyed. /boot/kexec_rollback.txt still references the old counter."
NOTE "Restore full functionality from the GUI: Options -> TPM/TOTP/HOTP Options -> Reset the TPM"
Loading