Skip to content

Conversation

@Victor1890
Copy link

The kind of change this PR does introduce

  • a bug fix
  • a new feature
  • an update to the documentation
  • a code change that improves performance
  • other

Current behavior

In multi-server environments using the cluster adapter, there is a race condition when calling socketsJoin() followed immediately by emit() to the same room. The issue occurs because:

  1. socketsJoin() publishes a message to join sockets to a room across workers
  2. The method returns immediately without waiting for confirmation
  3. A subsequent emit() may be processed before the join operation completes on remote workers
  4. Result: Some sockets don't receive the broadcast message

Example of the problematic pattern:

// This may fail in cluster mode
io.in(socketId).socketsJoin("room1");
io.to("room1").emit("my-event", payload); // Socket may not receive this

This affects users running Socket.IO in cluster mode (multiple Node.js processes) and causes intermittent message delivery failures that are difficult to debug.

New behavior

The socketsJoin() method now returns a Promise when using the cluster adapter, ensuring the join operation completes across all workers before resolving. Similarly, socketsLeave() and disconnectSockets() have been updated with the same pattern.

Correct usage pattern:

// Guaranteed delivery in cluster mode
await io.in(socketId).socketsJoin("room1");
io.to("room1").emit("my-event", payload); // All sockets will receive this

Key changes:

  • addSockets() in cluster adapter now uses async/await with publishAndReturnOffset()
  • Returns Promise when using cluster adapter, void for single-server setups
  • Backward compatible: existing code works but should add await for correctness
  • Same pattern applied to delSockets() (socketsLeave) and disconnectSockets()

Test coverage:

  • Added race condition test: "avoids race condition when followed by emit (with await)"
  • Test spawns 3 workers, calls await socketsJoin(), then broadcasts
  • Verifies all 3 client sockets receive the event (100% success rate)

Other information (e.g. related issues)

Fixes #4734

Implementation notes:

  • Based on similar pattern used in fetchSockets() which already uses async/await
  • The cluster adapter uses Node.js IPC messages with offset tracking to ensure all workers confirm the operation
  • Error handling included for publish failures
  • 100% test coverage on cluster adapter with all 19 tests passing

Migration guidance:

// Before (unreliable in cluster)
io.in(socketId).socketsJoin("room1");
io.to("room1").emit("event", data);

// After (guaranteed ordering)
await io.in(socketId).socketsJoin("room1");
io.to("room1").emit("event", data);

Breaking changes:

  • Return type changes from void to Promise | void
  • Existing code continues to work but may have race conditions in cluster mode
  • TypeScript users will get type hints to await the call
  • No breaking changes for single-server deployments (still returns void)

@Victor1890 Victor1890 marked this pull request as ready for review December 24, 2025 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue with new socketsJoin

1 participant