Skip to content

fix sudo TTY race and terminal state corruption#4052

Open
niltonperimneto wants to merge 4 commits intomicro-editor:masterfrom
niltonperimneto:master
Open

fix sudo TTY race and terminal state corruption#4052
niltonperimneto wants to merge 4 commits intomicro-editor:masterfrom
niltonperimneto:master

Conversation

@niltonperimneto
Copy link

This was mentioned in #4050
If you want to know how I came to this bug and why I'm making this commit.

In short: two sudo dd subprocesses (truncate and write) were spawned
concurrently, racing on TCGETS2/TCSETS2 ioctls for the password
prompt. This strictly fails with sudo-rs, which enforces auditable
escalation boundaries. Serialize them by running truncation to
completion before starting the write process.

After a failed subprocess, TempStart() re-initialized tcell on top
of the corrupted TTY state left by the dead child, poisoning the
parent shell on exit. Fix this by capturing the canonical terminal
state once at startup and restoring it in TempStart() before
tcell's Init() call.

Also close pipe fds on error paths to prevent fd leaks, wire
truncation stderr so sudo errors are visible, and align build
constraints with tcell's Unix platform list (aix, zos).

Signed-off-by: Nilton Perim Neto niltonperimneto@gmail.com


// Init creates and initializes the tcell screen
func Init() error {
capturePristineState()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this capture/restore seems to address a separate issue, different from the issue of the race between concurrent sudo processes (although triggered by it, among other things)? So it would be better to do this in a separate commit, to not mix both issues together?

And while we're at it, the commit message of that separate commit could explain why exactly this separate issue is actually an issue and needs to be fixed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I just don't actually know how to split commits

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just implement both things in separate commits and use git push --force to update the history of commits in the PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise (i.e. if you just keep adding new commits on top of existing ones), the PR will be just unreviewable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is done. I rebased into two separate commits and force pushed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said, this capturePristineState() and restorePristineTermState() is nothing we do in micro, because it is done by our dependency tcell and usually this works as long as micro doesn't crash before it can restore the terminal with the help of tcell.
So from my point of view we should drop ab06490.

if screen.Screen != nil {
screen.Screen.Fini()
}
screen.RestorePristineState()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to move this into the above if screen.Screen != nil?

And then, we might introduce a nice helper screen.Fini() doing both screen.Screen.Fini() and this thing?

Hmm... BTW why restore pristine state when restarting the screen in TempStart() -> screen.Init(), not when stopping the screen in TempFini() already?

And then we won't need a special case for screen.Init() called first time, we can just capture new pristine state in screen.Init() every time? That is both simpler and more correct?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... Actually all this capture/restore is already done by tcell, inside screen.Screen.Init() and screen.Screen.Fini(), isn't it?

In upstream tcell:

capture: https://github.com/gdamore/tcell/blob/e62057809ee3eba9cb78a56c76d0612d8bb89fc2/tty/tty_unix.go#L83
restore: https://github.com/gdamore/tcell/blob/e62057809ee3eba9cb78a56c76d0612d8bb89fc2/tty/tty_unix.go#L108

Micro happens to use a very outdated fork of tcell, but it does that too:

capture: https://github.com/micro-editor/tcell/blob/afe1210c81d891e8e27d166e767f69df56681f47/tscreen_unix.go#L37
restore: https://github.com/micro-editor/tcell/blob/afe1210c81d891e8e27d166e767f69df56681f47/tscreen_unix.go#L69

So, this custom implementation of capture/restore seems redundant (it solves an already solved problem)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the key point worth clarifying: tcell calls term.MakeRaw() on every Init() call, which snapshots whatever terminal state is current at that moment. After a failed child process corrupts the TTY, tcell's subsequent Init() faithfully snapshots the corrupted state and uses it as the baseline for Fini(). The fix captures the state once, before the first MakeRaw(), and restores it before each re-init so that tcell always starts from a clean baseline. At least it is what I was thinking when I wrote it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Yeah, seems reasonable.

I also see that, for instance, vim also restores the initial "pristine" state, not the one left by the last shell command it ran.

More precisely, in the case when I "suspend" vim via Ctrl-Z (SIGSTOP) and then resume it via fg (SIGSTOP), I see it respects the tty state left by this shell while it was suspended. For example, if I run stty -echo to disable echo in this shell before resuming vim, then after I exit vim, I see echo is still disabled.

But, in the case when I let vim itself execute some shell command, then vim ignores the tty state left by this command, and restores the original state instead. For example, if I run :!(stty -echo; cat) command in vim, then echo is disabled in this sub-shell, but after exiting vim, echo is enabled.

And that makes sense.

So, it seems your fix is doing the right thing overall.

But it seems this fix belongs in the tcell library, rather than in micro itself. It seems ugly, and maybe potentially fragile, to let micro itself mess with the tty, without tcell knowing about it?

I see the upstream tcell has these APIs:

	// Suspend pauses input and output processing.  It also restores the
	// terminal settings to what they were when the application started.
	// This can be used to, for example, run a sub-shell.
	Suspend() error

	// Resume resumes after Suspend().
	Resume() error

Unfortunately, at least from the code it looks like Resume() doesn't really restore the initial state captured in Init(), it just re-captures it once again. Maybe that is not the intended behavior, and should be fixed in tcell?

So maybe what we should do is: backport Suspend() and Resume() from upstream to our fork of tcell, and fix them to make them work as we want them to? (And then probably also upstream the fix to tcell, since, I guess, it would be useful not just for micro.)

And then in micro we would be just using Suspend() and Resume() (instead of those tricky repeated Fini()/Init() calls with a bunch of redundant re-initialization in Init() every time).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe what we should do is: backport Suspend() and Resume() from upstream to our fork of tcell, and fix them to make them work as we want them to?

Exactly! This is something, which definitely belongs to tcell instead of micro.
Therefore I vote to drop ab06490.

niltonperimneto added a commit to niltonperimneto/micro that referenced this pull request Mar 24, 2026
link: micro-editor#4052 (comment)
Signed-off-by: Nilton Perim Neto <niltonperimneto@gmail.com>
When saving a protected file, micro spawned two sudo dd subprocesses
concurrently: a truncation dd and a write dd. Both competed for the
TTY to prompt for the sudo password, racing on TCGETS2/TCSETS2 ioctls
and deadlocking stdin routing. This strictly fails with sudo-rs, which
enforces auditable escalation boundaries.

Fix by removing notrunc and the separate truncation subprocess entirely.
A single dd subprocess now handles truncation and write in one shot,
making the race impossible by construction. The conv=fsync guarantee is
preserved on platforms that support it.

Move writeBackup() before openFile() in safeWrite() for the sudo path
so the backup always exists before dd truncates the file. If openFile()
fails after the backup is created, return OverwriteError so the user
knows their data is safe in the backup. The non-sudo path is unchanged.

Link: micro-editor#4050
Signed-off-by: Nilton Perim Neto <niltonperimneto@gmail.com>
After a failed sudo subprocess, TempStart() called tcell Init()
directly on top of the corrupted TTY state left by the dead child.
Since tcell captures the current terminal state on every Init() call
via MakeRaw(), it would snapshot the corrupted state and use it as
the baseline for Fini(). On exit, this poisoned configuration was
written back to the parent shell.

Fix by capturing the canonical terminal state once at startup, before
tcell's first MakeRaw() call. In TempStart(), restore that pristine
state before calling Init() so tcell always starts from a clean
baseline regardless of what a failed child process left behind.
Also call RestorePristineState() in exit() and the panic handler as
a defense in depth measure.

Add a screen.Fini() helper that combines Screen.Fini() and
RestorePristineState() to remove duplication in exit paths.

Align build constraints with tcell's Unix platform list to cover
aix and zos, which were previously falling through to no-op stubs.

Link: micro-editor#4050
Signed-off-by: Nilton Perim Neto <niltonperimneto@gmail.com>
backupName, resolveName, err := b.writeBackup(path)
// When using sudo, create the backup before opening the file because
// openFile() truncates the target immediately. For non-sudo saves,
// openFile() does not truncate on open, so the existing order is kept.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The existing order" is gonna sound confusing. It refers to the old behavior before this change, but this is a comment in the code, not a changelog. So someone reading this comment will not know what "the existing order" refers to.

So e.g. "...For non-sudo saves, openFile() does not truncate on open, i.e. open is non-destructive, so only create the backup after open succeeds." ?

}

// Backup saved, so cancel pending periodic backup, if any
delete(requestedBackups, b)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this? We should cancel pending backups right after saving the backup in the sudo case as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right 👍 , overseen by me.

Signed-off-by: Nilton Perim Neto <niltonperimneto@gmail.com>
moved the delete(requestedBackups, b) right after writeBackup() in the sudo path
so it runs before openFile() can fail. As suggested by dmaluka

Link: micro-editor#4052
Signed-off-by: Nilton Perim Neto <niltonperimneto@gmail.com>
@JoeKar
Copy link
Member

JoeKar commented Mar 25, 2026

Here is a short list of commits I'd expect for this PRs change history:

  1. revert commit of 165a5a4
  2. commit to restore conv=fsync, as done within 6556c11
  3. commit to slightly change the backup logic, as done within 6556c11

Nothing more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants