[2026春季][T1-1-1] mygitljf#1166
Open
mygitljf wants to merge 2 commits into
Open
Conversation
…n dispatch and tests
Wires the five T1-1-1 operators through infinicore.ntops.torch on CUDA:
- python/infinicore/ops/{rad2deg,copysign,lcm,nextafter,lgamma}.py: thin
dispatchers calling infinicore.ntops.torch.<op>.
- python/infinicore/__init__.py: re-export the five ops.
- test/infinicore/ops/{rad2deg,copysign,lcm,nextafter,lgamma}.py: framework
tests covering OUT_OF_PLACE and INPLACE(out=c) on float16/bfloat16/float32
(lcm: int8/int16/int32/int64). nextafter, copysign, lcm, lgamma run
bit-exact against torch.
Verified on NVIDIA A100 80GB PCIe with --nvidia (172/172 passed).
…ache_for_benchmark
Some Triton driver backends (e.g. MetaX MACA's MacaDriver) do not
implement Triton benchmark's `get_empty_cache_for_benchmark` /
`clear_cache` helpers. Calling them eagerly aborts the run before any
op is ever dispatched.
Probe the driver with `getattr` + `callable` and only install the
cache-clear hook when both helpers exist. Backends that expose them
(e.g. NVIDIA's CudaDriver) keep the original behavior; backends that
do not simply skip cache clearing - correctness and device-event
timing are unaffected.
Verification (MetaX C500, --metax):
InfiniCore run.py --bench device --num_prerun 50 --num_iterations 1000
for rad2deg copysign lcm nextafter lgamma:
Total tests run: 5, Passed: 5
[Device] PyTorch: 110695.750 ms
[Device] InfiniCore: 108326.593 ms
Device Speedup (PyTorch/InfiniCore): 1.022x
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Platform Compatibility
NVIDIA A100, MetaX C500, Iluvatar MR-V100:
On Iluvatar / CoreX the
bitcast storelowering path for fp16 / bf16 is not available, socopysign/nextafterfall back totorch.<op>for fp16 / bf16 on that platform viantops.torch.utils._is_corex_compat_device. NVIDIA and MetaX still take the original ntops kernel path.Test Commands
ntops correctness
ntops performance
Set
NTOPS_RUN_PERF=1to enable.warmup=50,rep=200, median of 3.InfiniCore correctness + performance (switch the device flag per platform)
Test Results (100% pass on all three platforms)
Correctness
| Platform | Device flag | ntops pytest | InfiniCore run.py |
|---|---|
| NVIDIA A100 80GB PCIe |
--nvidia| 44 passed, 4 skipped | 172 / 172 (100.0%) || MetaX C500 |
--metax| 44 passed, 4 skipped | 172 / 172 (100.0%) || Iluvatar MR-V100 |
--iluvatar| 44 passed, 4 skipped | 172 / 172 (100.0%) |The
4 skippedcases arelcmonbooldtype (torch.lcmitself does not supportbool).Performance
ntops
InfiniCore
NVIDIA A100:

MetaX C500:
Iluvatar MR-V100: