Skip to content

feat: support send and receive#13

Open
GordonYang1 wants to merge 1 commit into
InfiniTensor:masterfrom
GordonYang1:feat/support-send-recv-minimal
Open

feat: support send and receive#13
GordonYang1 wants to merge 1 commit into
InfiniTensor:masterfrom
GordonYang1:feat/support-send-recv-minimal

Conversation

@GordonYang1
Copy link
Copy Markdown
Collaborator

@GordonYang1 GordonYang1 commented May 19, 2026

Summary

This PR introduces blocking point-to-point Send/Recv support for InfiniCCL with an OpenMPI-based backend implementation, along with a dedicated example program for functionality verification.

The implementation currently uses host-staging for device buffers and blocking MPI_Send / MPI_Recv internally. The example covers basic Send/Recv correctness, zero-count behavior, ping/ping-pong cases, invalid peer validation, optional large-count testing, and heterogeneous NVIDIA + MetaX rank/device mapping.

Changes

  • Public Send/Recv API

    • add public API declarations for:
      • infiniSend()
      • infiniRecv()
    • expose Send/Recv through the common communicator interface.
  • Base Send/Recv Wrappers

    • add src/base/send.h;
    • add src/base/recv.h;
    • validate peer rank range before dispatching to backend implementation;
    • return infiniInvalidArgument for invalid peer ranks.
  • OpenMPI-based Send/Recv Implementation

    • add src/ompi/impl/send.h;
    • add src/ompi/impl/recv.h;
    • implement blocking point-to-point communication with MPI_Send and MPI_Recv;
    • use host-staging buffers for device memory transfer;
    • split large byte counts into INT_MAX-bounded MPI chunks.
  • Send/Recv Example and Test Coverage

    • add examples/send_recv.cc;
    • cover:
      • zero-count Send/Recv;
      • blocking ping from rank 0 to rank 1;
      • blocking ping from rank 0 to rank size - 1;
      • blocking ping-pong between rank 0 and rank 1;
      • invalid peer validation;
      • optional large-count test enabled by INFINI_SENDRECV_LARGE=1;
    • print rank/device mapping to make heterogeneous test logs easier to inspect.

Known Issues & Future Work

  • The current OpenMPI Send/Recv implementation is blocking and does not overlap communication with computation. Future work may introduce non-blocking point-to-point APIs and stream-aware asynchronous execution.
  • The current implementation uses temporary host-staging buffers allocated with malloc/free on each invocation. Future work may introduce reusable host buffer pools, pinned host memory, or allocator caching to reduce overhead.
  • The current implementation uses a fixed MPI tag (0) internally. Future extensions may expose tags or add request-based APIs if more advanced point-to-point patterns are needed.
  • Large-count testing is disabled by default because it requires around 2GB per rank. It can be enabled explicitly with INFINI_SENDRECV_LARGE=1.

Logs & Screenshots

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 92412fdd4f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/ompi/impl/send.h
Comment on lines +44 to +47
Runtime<kDev>::Memcpy(host_buf, send_buff, total_bytes,
Runtime<kDev>::MemcpyDeviceToHost);
Runtime<kDev>::StreamSynchronize(
static_cast<Runtime<kDev>::Stream>(stream));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Propagate failed device-to-host sends

When the send buffer is an invalid/stale device pointer or the selected runtime reports a stream error, this path still continues to MPI_Send and returns success because both Runtime<kDev>::Memcpy and StreamSynchronize results are ignored. That can send uninitialized staging data and hide the actual failure; these runtime calls should be checked and returned, as the existing collective implementations do with CHECK_STATUS.

Useful? React with 👍 / 👎.

Comment thread src/ompi/impl/recv.h
Comment on lines +59 to +60
Runtime<kDev>::Memcpy(recv_buff, host_buf, total_bytes,
Runtime<kDev>::MemcpyHostToDevice);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Propagate failed host-to-device receives

When the receive destination is an invalid/stale device pointer or the H2D copy fails, infiniRecv still frees the staging buffer and returns success because the Runtime<kDev>::Memcpy result is ignored. In that scenario the MPI receive completed but the user's device buffer was not updated, so callers get silent data corruption instead of an error status.

Useful? React with 👍 / 👎.

@GordonYang1 GordonYang1 force-pushed the feat/support-send-recv-minimal branch from 92412fd to 41e0b95 Compare May 19, 2026 03:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant