Skip to content

Kernel embeddings#98

Draft
badnikhil wants to merge 6 commits into
libmir:masterfrom
badnikhil:kernel_embeddings
Draft

Kernel embeddings#98
badnikhil wants to merge 6 commits into
libmir:masterfrom
badnikhil:kernel_embeddings

Conversation

@badnikhil
Copy link
Copy Markdown
Contributor

@badnikhil badnikhil commented May 25, 2026

Summary

With this PR, all DCompute runtime infrastructure is managed lazily and transparently behind the scenes. Developers only need to write their host code, allocate memory (Buffer), and launch their compute kernels directly using launch!k.


Major Changes

1. Lazy Static Init Runtime (source/dcompute/driver/cuda/runtime.d)

  • Implements a global platform initializer (shared static this()) that initializes CUDA, discovers active GPUs, allocates the default Context (Device 0), and pushes it onto the context stack.
  • Implements a thread-local queue initializer (static this()) that ensures every thread gets a lock-free, dedicated Queue (CUstream) with zero resource contention.
  • Implements a lightweight, double-checked ensureInit() guard as a defensive safety fallback for edge cases.

2. Context-Sensitive Compile-Time PTX Embedding (source/dcompute/driver/cuda/package.d)

  • Moves the compile-time PTX import() statement inside the launch!k template definition.
  • Why this is critical: Because launch! is a template, it is instantiated inside the parent project's compilation context.
  • This allows the dcompute library to compile as a standard static library without requiring any local PTX files or string import flags, while seamlessly embedding the consumer project's custom PTX at compile time.

3. Defensive Safety Triggers (source/dcompute/driver/cuda/buffer.d)

  • Inserts ensureInit() triggers inside both Buffer!T constructors.
  • just for Safety.

4. dub.json update

  • "stringImportPaths": ["."] or -J flag should be used with the path where ptx is generated .

Developer Workflow & Flow of State

1. Compilation Flow:

  1. LDC compiles your @compute modules (e.g. tests/kernel.d) directly into PTX intermediate assembly (kernels_cuda800_64.ptx).
  2. Your parent project's DUB configuration passes -J. (the current directory) to the host compilation.
  3. During host compilation, the D compiler instantiates launch!matmul. The compiler processes import("kernels_cuda800_64.ptx"), embedding the GPU bytecode directly into your executable's text segment.

2. Execution Flow:

  1. Lazy Host Setup: The moment a Buffer is instantiated, the underlying static constructors initialize CUDA, assign the default device, push the GPU context, and initialize the active thread's CUDA stream.
  2. Lazy Program Load (First Launch): The first time launch! is executed, it checks if Program.globalProgram is initialized. Seeing it is null, it passes the embedded PTX string to cuModuleLoadData, registering your custom kernels in the GPU context.
  3. High-Performance Dispatch (Subsequent Launches): On all future kernel launches, the initialization check is skipped, directly enqueuing your kernels on the active CUDA stream with zero driver overhead.

Current State & Validation

All internal unittests and client applications compile, link, and validate successfully in one command:

  • Internal DCompute Unittests: dub test --compiler=ldc2 completes and passes successfully.
  • Client Benchmark Applications: dub run --force --compiler=ldc2 builds cleanly from scratch, embeds custom matmul kernels, executes them on the GPU, and validates output against host CPU matrices.

@badnikhil
Copy link
Copy Markdown
Contributor Author

while ensureInit in buffers is not required but we don't know what a user will do . so , added a check

Comment thread source/dcompute/driver/cuda/program.d Outdated
* runtime.
*
* Example:
* Program p = Program.fromEmbedded!(import("kernel.ptx"))();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be possible to do Program.fromEmbedded!"kernel.ptx" and do a mixin+import

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants