Conversation
BLOCK_SIZE is a generic name, which causes a conflict when including the io_uring library header. Probably because the macro definition of BLOCK_SIZE in the kernel header is included in the io_uring library header. Signed-off-by: Masahiro Yamada (KIOXIA) <masahiro31.yamada@kioxia.com>
io_uring is enabled by building with -DIOURING. In the case of aio, io_context_t was defined as the type of IOContext. io_context_t is a pointer to struct io_context. io_uring is implemented with the same idea, defining the type of IOContext as a pointer to struct io_uring. Signed-off-by: Masahiro Yamada (KIOXIA) <masahiro31.yamada@kioxia.com>
|
@microsoft-github-policy-service agree company="Kioxia" |
There was a problem hiding this comment.
Pull request overview
This pull request adds support for io_uring as an alternative to libaio for asynchronous I/O operations on Linux systems. The implementation provides a compile-time option to switch between libaio and io_uring via the IOURING CMake flag, allowing users to leverage io_uring's superior performance characteristics on recent Linux distributions. Additionally, the PR renames BLOCK_SIZE constants in several files to more descriptive names (NODE_BLOCK_SIZE, POINT_BLOCK_SIZE) to avoid naming conflicts with the QUEUE_DEPTH constant used in the io_uring implementation.
Changes:
- Adds new
linux_aligned_file_reader_uring.cppimplementation using liburing API that mirrors the existing libaio implementation - Introduces conditional compilation support in CMakeLists.txt for building with io_uring via
-DIOURING=True - Renames BLOCK_SIZE constants to domain-specific names (NODE_BLOCK_SIZE, POINT_BLOCK_SIZE) to prevent macro naming conflicts
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| src/linux_aligned_file_reader_uring.cpp | New io_uring-based implementation of aligned file reader with async I/O operations |
| include/linux_aligned_file_reader_uring.h | Header file declaring the io_uring variant of LinuxAlignedFileReader class |
| include/aligned_file_reader.h | Conditional includes and typedef for IOContext to support both libaio and io_uring |
| src/pq_flash_index.cpp | Renames BLOCK_SIZE to NODE_BLOCK_SIZE to avoid conflicts |
| src/partition.cpp | Renames BLOCK_SIZE to POINT_BLOCK_SIZE for clarity |
| CMakeLists.txt | Adds DISKANN_ASYNC_LIB_URING variable for linking liburing |
| src/CMakeLists.txt | Conditional compilation to include either libaio or io_uring implementation |
| apps/CMakeLists.txt | Updates link libraries to include io_uring library |
| apps/utils/CMakeLists.txt | Updates link libraries to include io_uring library |
| python/CMakeLists.txt | Updates link libraries to include io_uring library |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // if requests didn't complete | ||
| if (ret != 0) | ||
| { | ||
| std::cerr << "io_uring_waite_cqes() failed; returned " << ret << ", expected=" << n_ops |
There was a problem hiding this comment.
Typo in error message: "ernno" should be "errno"
| uint64_t n_tries = 0; | ||
| while (n_tries <= n_retries) | ||
| { | ||
| // send io requests here | ||
| int64_t ret = io_uring_submit(ctx); | ||
| // if requests didn't get accepted | ||
| if (ret != (int64_t)n_ops) | ||
| { | ||
| std::cerr << "io_uring_submit() failed; returned " << ret << ", expected=" << n_ops << ", ernno=" << errno | ||
| << "=" << ::strerror(-ret) << ", try #" << n_tries + 1; | ||
| std::cout << "ctx: " << ctx << "\n"; | ||
| exit(-1); | ||
| } | ||
| else | ||
| { | ||
| struct io_uring_cqe *cqes[QUEUE_DEPTH]; | ||
| unsigned int count = 0; | ||
| // wait on io_uring | ||
| int64_t ret = io_uring_wait_cqes(ctx, cqes, n_ops, nullptr, nullptr); | ||
|
|
||
| // if requests didn't complete | ||
| if (ret != 0) | ||
| { | ||
| std::cerr << "io_uring_waite_cqes() failed; returned " << ret << ", expected=" << n_ops | ||
| << ", ernno=" << errno << "=" << ::strerror(-ret) << ", try #" << n_tries + 1 << std::endl; | ||
| exit(-1); | ||
| } | ||
| else | ||
| { | ||
| io_uring_cq_advance(ctx, n_ops); | ||
| break; | ||
| } | ||
| } | ||
| n_tries++; |
There was a problem hiding this comment.
The retry logic appears broken. When io_uring_submit or io_uring_wait_cqes fails, the function calls exit(-1), so the increment of n_tries on line 79 is never reached. This means the n_retries parameter is effectively unused. Consider restructuring the error handling to allow retries before exiting, similar to the pattern in linux_aligned_file_reader.cpp where retries actually happen.
| IOContext ctx = (struct io_uring *)malloc(sizeof(struct io_uring)); | ||
| int ret = io_uring_queue_init(QUEUE_DEPTH, ctx, 0); | ||
| if (ret != 0) | ||
| { | ||
| lk.unlock(); | ||
| if (ret == -EAGAIN) | ||
| { | ||
| std::cerr << "io_setup() failed with EAGAIN" << std::endl; | ||
| } | ||
| else | ||
| { | ||
| std::cerr << "io_setup() failed; returned " << ret << ": " << ::strerror(-ret) << std::endl; | ||
| } | ||
| } |
There was a problem hiding this comment.
If io_uring_queue_init fails, the allocated memory at ctx is never freed, causing a memory leak. Add free(ctx) in the error path before returning or exiting.
| std::cerr << "io_setup() failed with EAGAIN" << std::endl; | ||
| } | ||
| else | ||
| { | ||
| std::cerr << "io_setup() failed; returned " << ret << ": " << ::strerror(-ret) << std::endl; |
There was a problem hiding this comment.
The error message refers to "io_setup()" which is the libaio function name. This should be updated to "io_uring_queue_init()" since this is the io_uring implementation. This appears to be a copy-paste error from the libaio version.
| std::cerr << "io_setup() failed with EAGAIN" << std::endl; | |
| } | |
| else | |
| { | |
| std::cerr << "io_setup() failed; returned " << ret << ": " << ::strerror(-ret) << std::endl; | |
| std::cerr << "io_uring_queue_init() failed with EAGAIN" << std::endl; | |
| } | |
| else | |
| { | |
| std::cerr << "io_uring_queue_init() failed; returned " << ret << ": " << ::strerror(-ret) << std::endl; |
| @@ -67,10 +67,10 @@ add_executable(partition_with_ram_budget partition_with_ram_budget.cpp) | |||
| target_link_libraries(partition_with_ram_budget ${PROJECT_NAME} ${DISKANN_TOOLS_TCMALLOC_LINK_OPTIONS}) | |||
|
|
|||
| add_executable(merge_shards merge_shards.cpp) | |||
| target_link_libraries(merge_shards ${PROJECT_NAME} ${DISKANN_TOOLS_TCMALLOC_LINK_OPTIONS} ${DISKANN_ASYNC_LIB}) | |||
| target_link_libraries(merge_shards ${PROJECT_NAME} ${DISKANN_TOOLS_TCMALLOC_LINK_OPTIONS} ${DISKANN_ASYNC_LIB} ${DISKANN_ASYNC_LIB_URING}) | |||
|
|
|||
| add_executable(create_disk_layout create_disk_layout.cpp) | |||
| target_link_libraries(create_disk_layout ${PROJECT_NAME} ${DISKANN_ASYNC_LIB} ${DISKANN_TOOLS_TCMALLOC_LINK_OPTIONS}) | |||
| target_link_libraries(create_disk_layout ${PROJECT_NAME} ${DISKANN_ASYNC_LIB} ${DISKANN_ASYNC_LIB_URING} ${DISKANN_TOOLS_TCMALLOC_LINK_OPTIONS}) | |||
There was a problem hiding this comment.
The DISKANN_ASYNC_LIB_URING library is unconditionally linked even when IOURING is not enabled. This will cause a linking error when building without -DIOURING=True, as the uring library won't be needed and may not be installed. The linking should be conditional on IOURING being enabled. Consider wrapping these link statements with: if(IOURING) target_link_libraries(...${DISKANN_ASYNC_LIB_URING}...) endif()
| target_link_libraries(search_memory_index ${PROJECT_NAME} ${DISKANN_ASYNC_LIB} ${DISKANN_ASYNC_LIB_URING} ${DISKANN_TOOLS_TCMALLOC_LINK_OPTIONS} Boost::program_options) | ||
|
|
||
| add_executable(build_disk_index build_disk_index.cpp) | ||
| target_link_libraries(build_disk_index ${PROJECT_NAME} ${DISKANN_TOOLS_TCMALLOC_LINK_OPTIONS} ${DISKANN_ASYNC_LIB} Boost::program_options) | ||
| target_link_libraries(build_disk_index ${PROJECT_NAME} ${DISKANN_TOOLS_TCMALLOC_LINK_OPTIONS} ${DISKANN_ASYNC_LIB} ${DISKANN_ASYNC_LIB_URING} Boost::program_options) | ||
|
|
||
| add_executable(search_disk_index search_disk_index.cpp) | ||
| target_link_libraries(search_disk_index ${PROJECT_NAME} ${DISKANN_ASYNC_LIB} ${DISKANN_TOOLS_TCMALLOC_LINK_OPTIONS} Boost::program_options) | ||
| target_link_libraries(search_disk_index ${PROJECT_NAME} ${DISKANN_ASYNC_LIB} ${DISKANN_ASYNC_LIB_URING} ${DISKANN_TOOLS_TCMALLOC_LINK_OPTIONS} Boost::program_options) | ||
|
|
||
| add_executable(range_search_disk_index range_search_disk_index.cpp) | ||
| target_link_libraries(range_search_disk_index ${PROJECT_NAME} ${DISKANN_ASYNC_LIB} ${DISKANN_TOOLS_TCMALLOC_LINK_OPTIONS} Boost::program_options) | ||
| target_link_libraries(range_search_disk_index ${PROJECT_NAME} ${DISKANN_ASYNC_LIB} ${DISKANN_ASYNC_LIB_URING} ${DISKANN_TOOLS_TCMALLOC_LINK_OPTIONS} Boost::program_options) |
There was a problem hiding this comment.
The DISKANN_ASYNC_LIB_URING library is unconditionally linked even when IOURING is not enabled. This will cause a linking error when building without -DIOURING=True, as the uring library won't be needed and may not be installed. The linking should be conditional on IOURING being enabled. Consider wrapping these link statements with: if(IOURING) target_link_libraries(...${DISKANN_ASYNC_LIB_URING}...) endif()
| ${DISKANN_ASYNC_LIB_URING} | ||
| ) | ||
|
|
There was a problem hiding this comment.
The DISKANN_ASYNC_LIB_URING library is unconditionally linked even when IOURING is not enabled. This will cause a linking error when building without -DIOURING=True, as the uring library won't be needed and may not be installed. The linking should be conditional on IOURING being enabled.
| ${DISKANN_ASYNC_LIB_URING} | |
| ) | |
| ) | |
| if (IOURING) | |
| target_link_libraries( | |
| _diskannpy | |
| PRIVATE | |
| ${DISKANN_ASYNC_LIB_URING} | |
| ) | |
| endif() |
| endif() | ||
| if (NOT IOURING) | ||
| list(APPEND CPP_SOURCES linux_aligned_file_reader.cpp) | ||
| else() |
There was a problem hiding this comment.
Tab character used for indentation. CMake files in this codebase use spaces for indentation. Please replace the tab with spaces to maintain consistency.
| else() | |
| else() |
| } | ||
| #endif | ||
|
|
||
| // break-up requests into chunks of size QUEUE_DEPTH each |
There was a problem hiding this comment.
The indentation uses tab characters, which is inconsistent with the rest of the codebase. The existing linux_aligned_file_reader.cpp uses spaces for indentation. Please replace all tab characters with spaces throughout this file to maintain consistency.
| { | ||
| struct io_uring_sqe *sqe = io_uring_get_sqe(ctx); | ||
| if (!sqe){ | ||
| std::cerr << "io_uring_get_sqe() failed; ernno=" << errno; |
There was a problem hiding this comment.
Typo in error message: "ernno" should be "errno"
Does this PR have a descriptive title that could go in our release notes?
Should this result in any changes to our documentation, either updating existing docs or adding new ones?
Does this PR add any new dependencies?
Does this PR modify any existing APIs?
Reference Issues/PRs
Nothing.
What does this implement/fix? Briefly explain your changes.
This patch provides a means to switch from aio to io_uring in diskann.
Generally, aio is older, and in recent Linux distributions, io_uring is often adopted due to its superior performance.
https://www.snia.org/sites/default/files/SDC/2019/presentations/Storage_Performance/Kariuki_John_Verma_Vishal_Improved_Storage_Performance_Using_the_New_Linux_Kernel_I.O_Interface.pdf
By installing the liburing-dev package and building with the -DIOURING=True macro, the parts using libaio will be replaced with liburing.
Using a system with 128 AMD cores and 8 PCIe Gen5 SSDs configured in RAID 0, we compared the QPS between aio and io_uring. We observed a performance improvement of 9% to 27% with fewer than 100 threads. However, once the thread count exceeded 100 and approached the physical CPU core count of 128, we found that aio performed better.
It appears that io_uring is not completely superior to aio, so I implemented a patch to allow switching between aio and io_uring. If you have any other good implementation methods or solutions, please feel free to suggest them.
Any other comments?
Nothing.