Skip to content

Concurrent opencode run instances deadlock during init (futex, no timeout) — second overlapping instance hangs forever #29395

@paat

Description

@paat

Summary

When two opencode run processes overlap in time on the same data dir, the instance that starts while another is still alive deadlocks during initialization and never makes progress — no model call, no error, no timeout. It parks on a futex indefinitely until killed externally. A single opencode run, or strictly serial (non-overlapping) runs, always work.

This is not provider/model/auth/network related — see the minimal repro below, which deadlocks with a bogus model (init never even reaches model resolution).

Minimal reproduction

# Two runs that overlap in time. Bogus model on purpose: no valid provider,
# no auth, no network needed. The deadlock is in instance bootstrap, BEFORE
# model resolution — so neither run ever reaches an "unknown model" error.

( timeout 45 opencode run --agent plan -m bogus/nope-1 "hi"; echo "run1 exit=$?" ) &
sleep 3
( timeout 45 opencode run --agent plan -m bogus/nope-2 "hi"; echo "run2 exit=$?" ) &
wait

Observed: both print exit=124 (timed out, killed), empty output. Neither errors on the invalid model.

For contrast:

  • A single opencode run (any real model, full flags) completes in ~7s.
  • Two runs started 5s apart also fail — the second one (which boots while the first is still alive) is the one that hangs. So it's temporal overlap, not a start-time race.

Expected behavior

A second concurrent instance should either run concurrently, or fail fast with a clear "another instance is using this data dir" message — not block forever on a lock with no timeout.

Diagnostics from a hung instance

  • All threads parked in futex_do_wait (one event-loop thread in ep_poll); ~1s of CPU over 7+ minutes, ~0.2% CPU — not computing.
  • Zero TCP connections, no SYN-SENT — it never attempts any network call. (DNS resolves fine.)
  • Its ~/.local/share/opencode/log/*.log stops at loading internal plugin and never reaches the creating instancellm runtime selected lines that a successful run logs. So it deadlocks in bootstrap/plugin-init.
  • The hung instance holds opencode.db, opencode.db-wal, and opencode.db-shm open — consistent with a lock on the shared SQLite data dir that the second instance waits on indefinitely.

Environment

  • opencode v1.15.10 (latest at time of filing)
  • Linux container, single user, default XDG_DATA_HOME (~/.local/share/opencode)
  • Bun runtime (bundled)

Impact / how I hit this

A multi-provider code-review tool launches several opencode run calls in parallel against the same data dir; the overlapping ones deadlock and only get freed by a wrapping timeout. Workaround on the caller side is to serialize the opencode run invocations, or give each its own XDG_DATA_HOME (with auth.json seeded into it). But the no-timeout deadlock itself looks like an opencode bug worth fixing.

Happy to provide wchan/fd dumps or a fuller debug log on request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions