-
Notifications
You must be signed in to change notification settings - Fork 123
Description
This issue collects a few improvements for the NonStationaryConvolve2D as raised in #465 (and a couple of others):
-
Refactor bilinear interpolation
The CUDA bilinear interp is the same as the CPU one. It then makes sense to wrap the same code under a same function. This can be done by coding a pure Python function and then defining a device function from it. For example:# CPU def fun(a, b): return a + b # GPU dev_fun = cuda.jit(device=True)(fun)
-
Move CPU to auxiliary file
Since the GPU functions are in another file and taking into account the above refactor, it makes sense to move it to another file. -
Improve thread launch parameters and iteration model
Currently, there is a limit to the size of the model, imposed by the number of blocks and threads in the call. There are two approaches to fixing this. The first is by making the total number of threads depend on the model. For example, if the model is 1000 x 1000, launch 1 block of 1024 x 1024. There is a problem which appears when very large models are used: the number of threads that are required exceed the number that the GPU can provide.
Grid stride looping can help with this as allows arbitrarily sized models to be processed by a smaller number of threads. For example, set 1 block of 1024 x 1024 and you can process any model of any size. The issue with naive grid-stride looping is that it may not be optimal in the number of threads. For example, a 32 x 32 model always launches 1024 x 1024 threads.
A solution to that is to combine both techniques: use an input-dependent number of blocks/threads (that are powers of 2), but set a maximum number of total threads launched. This can even be set per GPU, but is probably ok to hardcode for a reasonable number obtained from theMAX_GRID_DIM_Xattribute (see here). -
Create operator that takes filters as input
Currently, our operator takes a generic 2d signal as input and filters as parameter. However, an alternative operator that takes the 2d signal as parameter and the filters as inputs is desirable as it could be used to estimate the best non-stationary filters that match a given data and input signal. This is kind of equivalent topylops.avo.prestack.PrestackWaveletModellingfor the 1D case. I suggest making a completely new operator and calling itNonStationaryWavelets2DorNonStationaryFilters2D.