(Currently: based on guesswork and second-hand information from @matthiasdiener. Away at a conference, will investigate more when I have time.)
Right now:
- Sees CPU scalar passed, generates
pytato Placeholder with shape=().
- That becomes an
ArrayArg from Loopy's perspective
- We then spend tons of time transferring these itty bitty things to the GPU.
Proposed remedy:
- Introduce a tag in
pytato that says "this placeholder should become a ValueArg.
- Apply that in arraycontext when creating the
Placeholders if appropriate.
- Respect the tag in pytato codegen.
This should remove the cost of these transfers.
(Currently: based on guesswork and second-hand information from @matthiasdiener. Away at a conference, will investigate more when I have time.)
Right now:
pytatoPlaceholder withshape=().ArrayArgfrom Loopy's perspectiveProposed remedy:
pytatothat says "this placeholder should become aValueArg.Placeholders if appropriate.This should remove the cost of these transfers.