Skip to content

coverage: add cuda.core test_utils.py tests for DLPack/StridedMemoryView#2181

Open
rluo8 wants to merge 3 commits into
NVIDIA:mainfrom
rluo8:coverage-test-utils
Open

coverage: add cuda.core test_utils.py tests for DLPack/StridedMemoryView#2181
rluo8 wants to merge 3 commits into
NVIDIA:mainfrom
rluo8:coverage-test-utils

Conversation

@rluo8

@rluo8 rluo8 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Description

Split out from #2168 per review. Adds tests targeting
cuda/core/experimental/_memoryview.pyx:

  • DLPack export round-trip over all NumPy-native dtypes and
    scalar/zero-volume shapes
  • versioned/unversioned capsule + deleter paths
  • from_dlpack: ambiguous-stream and unsupported-device errors, TypeError -> unversioned-import fallback
  • CAI path: device-pointer + stream-ordering view, deprecated init path, has_dlpack=False proxy, device-accessible view, None-dtype repr
  • DLPack C exchange-API helpers

Coverage change:
_memoryview.pyx coverage 65.82% -> 85.73%.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

@leofang leofang requested review from mdboom and seberg June 9, 2026 14:15
@leofang leofang added P1 Medium priority - Should do test Improvements or additions to tests labels Jun 9, 2026
@leofang leofang added this to the cuda.core v1.1.0 milestone Jun 9, 2026

@seberg seberg left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd wait for @mdboom to have a brief look. As we discussed yesterday, I am not sure about all tests but I guess the improved coverage is really nice.
FWIW, in these changes I don't really see trivial gains for parametrization.

That said, I left a few comments and especially the TMD tests seem like a lot of effort to add coverage that should already exists (but you need the support to see that in the coverage).
I think it may make sense to add one test (which could use funky arguments) to check that an error is given when it isn't available. But I wouldn't add monkeypatch tests just to get coverage up here, I think.
(We should cover this with real tests not with dummy tests.)

The DLPack tests are pretty complex unittests with ctypes, I guess they are fine (this is unused, if there are bugs mirrored in implementation and tests we won't even notice, though -- which is a bit deceptive; it seemed fine but I'd have to look closer to be confident).
Either way, I feel this file is big enough and those tests complicated enough, that it may be good to split DLPack tests into a new one? (test_utils_dlpack.py or a new utils folder?)


The real ``_from_tiled`` requires a device-accessible, 16-byte-aligned view
on TMA-capable hardware (sm90+), so we replace the (module-level) class the
method imports with a recorder and assert the assembled call instead.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but can't we just skip the test if the hardware isn't available? And if it is, I expect we are already testing this?

I feel this test belongs into test_tensormap as a single test that checks that tensormap instantiation fails gracefully when TMA isn't available.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough review, @seberg ! I'll update based on these comments.

assert view.ptr == int(buffer.handle)
assert view.device_id == dev.device_id
assert view.shape == (8,)
dev.default_stream.sync()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this test a bit deceptive since it adds coverage without actually checking stream order correctness. (It does check for device correctness of course).
Now, testing stream-order correctness is annoying (you need either large buffers, or launch many kernels on the first stream and then e.g. a copy-to-host on after export and see that all of them finished)
So maybe it's OK, but the bot oversells the test by a lot.

"""The deprecated ``StridedMemoryView(obj)`` constructor routes a CAI-only
object through the CAI branch (warn + ``view_as_cai``), not the DLPack one."""
obj = _make_cuda_array_interface_obj(shape=(4,), strides=None, typestr="<f4", data=(0, False))
with pytest.deprecated_call(match="deprecated"):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with pytest.deprecated_call(match="deprecated"):
with pytest.deprecated_call(match="CUDA-array-interface-supporting"):

if we bother to match a string, try to make it unique.

#
# Drive the C function pointers exposed by the capsule the way a native
# consumer would, exercising the StridedMemoryView exchange-API implementation.
# ---------------------------------------------------------------------------

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add this, but I am starting to think this should be in it's own test_dlpack file, there is a lot of complexity here.

def _exchange_api_cause(exc):
"""Underlying exception raised by the noexcept C fn (surfaced by ctypes as
SystemError, with the real error chained as __cause__ or __context__)."""
return exc.value.__cause__ or exc.value.__context__

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unpacking the SystemError is a bad workaround for not supplying an errcheck function to the ctypes definition.
We shouldn't test for SystemError (unless we set them ourselves, which we don't).

api = _get_exchange_api()
out = ctypes.c_void_p(123)
with pytest.raises(SystemError) as exc:
api.managed_tensor_allocator(None, ctypes.byref(out), None, None)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
api.managed_tensor_allocator(None, ctypes.byref(out), None, None)
# Currently sets a Python error when `SetErr` isn't passed
api.managed_tensor_allocator(None, ctypes.byref(out), None, None)

This is OK, honestly, I am not sure if there is a point in even forcing this function to be non-NULL on the dlpack side if we just always make an error, but whatever.

I just prefer mentioning in the test that we know it's a weird test.

@@ -1094,3 +1094,425 @@ def test_strided_memory_view_dtype_roundtrip_all(dtype):
pytest.skip(f"NumPy does not export {np.dtype(dtype)} via DLPack: {e}")
view = StridedMemoryView.from_dlpack(src, stream_ptr=-1)
assert view.dtype == np.dtype(dtype) # .dtype triggers dtype_dlpack_to_numpy

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well delete this test and move the comment about bfloat16 to _NUMPY_NATIVE_DLPACK_DTYPES your new test seems basically identical but expands it a bit.

np.complex64,
np.complex128,
np.bool_,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
)
)
if ml_dtypes is not None:
# Supported on NumPy 2.5 and ml_dtypes (probably) 0.5.5+
_NUMPY_NATIVE_DLPACK_DTYPES += (ml_dtypes.bfloat16,)

Although then the "native" is slightly off :). But we have the try/except that should make this skipped and NumPy 2.5 + next ml_dtypes version this should actually work.
(In fact, we can probably add more dtypes if we want to.)

TypeError fallback in ``view_as_dlpack`` and an *unversioned* capsule import."""

def __init__(self, arr):
self._arr = arr

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self._arr = arr
self._arr = StridedMemoryView.from_any_interface(arr)

I would suggest to use a StridedMemoryView here rather than NumPy. Should do the same thing, but means we don't care if NumPy changes the behavior (which eventually it may) and stops exporting version 0.x (unversioned).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module P1 Medium priority - Should do test Improvements or additions to tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants