Simplify handling of kernelized functions#622
Conversation
Change `use_kernel_forward_from_hub` to work with functions. `use_kernel_func_from_hub` is now deprecated and its implementation is now just a call to `use_kernel_forward_from_hub`.
This function attaches a function that is made kernelizeable to a module, so that it can be discovered by `kernelize`.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Coverage report —
|
| Name | Stmts | Miss | Cover | Missing |
|---|---|---|---|---|
| src/kernels/__init__.py | 10 | 0 | 100% | |
| src/kernels/_system.py | 6 | 1 | 83% | 10 |
| src/kernels/_versions.py | 63 | 7 | 89% | 46, 49, 52-53, 56-57, 100 |
| src/kernels/backends.py | 194 | 55 | 72% | 40, 44, 48-51, 68, 90, 108, 117, 121, 125-127, 148, 170, 181, 188-191, 201, 205-225, 233, 256-276 |
| src/kernels/compat.py | 8 | 1 | 88% | 5 |
| src/kernels/deps.py | 54 | 4 | 93% | 58-59, 95, 98 |
| src/kernels/layer/__init__.py | 6 | 0 | 100% | |
| src/kernels/layer/_interval_tree.py | 103 | 4 | 96% | 23, 52, 147, 150 |
| src/kernels/layer/device.py | 48 | 14 | 71% | 42, 47-49, 91, 96-98, 101, 149, 152, 155-157 |
| src/kernels/layer/func.py | 81 | 7 | 91% | 81, 111, 183, 301, 307, 320, 338 |
| src/kernels/layer/globals.py | 5 | 0 | 100% | |
| src/kernels/layer/kernelize.py | 74 | 8 | 89% | 255, 281, 289-290, 296, 300, 316-318 |
| src/kernels/layer/layer.py | 210 | 16 | 92% | 167, 210, 216, 229, 337, 417-418, 430, 439, 447, 458, 487, 491, 504, 557, 587 |
| src/kernels/layer/mode.py | 14 | 0 | 100% | |
| src/kernels/layer/repos.py | 130 | 34 | 74% | 27, 33, 36-41, 61-62, 68, 71-74, 88, 92, 101-102, 108, 111-114, 121-122, 128, 131-134, 141-142, 148, 151-154, 235 |
| src/kernels/lockfile.py | 71 | 46 | 35% | 37-104, 108-131 |
| src/kernels/status.py | 49 | 2 | 96% | 23, 81 |
| src/kernels/utils.py | 301 | 60 | 80% | 65, 77-81, 87-88, 217, 221, 224, 286, 294, 333-334, 372, 403, 408, 443, 624-629, 659, 662, 664, 670, 683-684, 705-717, 721-728, 736, 740-750, 754-761, 799, 803, 822, 824 |
| src/kernels/variants.py | 262 | 19 | 93% | 56, 87, 108, 138, 247-248, 289, 291, 371-378, 384-390, 421-427, 439-445, 534-536 |
| TOTAL | 1689 | 278 | 84% |
Updated by the Test kernels workflow on commit 7a17a2b072739ce330628b1444b404aab4ddd749.
vasqu
left a comment
There was a problem hiding this comment.
My initial comments, I still need to test this on transformers side but it definitely would make our lives easier ❤️ only thing I'm personally missing is the option to also remove the hidden modules, we kernelize once tbh and should not allow rekernelizing (for now)
| mode=mode, | ||
| device_type=device_type, | ||
| use_fallback=use_fallback, | ||
| ) |
There was a problem hiding this comment.
I think we should detach them afterwards or at least add this as an optional flag to enable this, wdyt?
There was a problem hiding this comment.
Detaching sounds like a bad idea, because then you cannot kernelize the same model multiple times, which is a scenario that we want to support. (E.g. for kernelizing with different mappings or kernelizing with a different mode.)
There was a problem hiding this comment.
I have to check whether deepspeed still plays well with this. Not sure if it still takes a look at these hidden dicts, it shouldn't but we never know 😓
There was a problem hiding this comment.
The ability to detach them through a flag doesn't sound so bad, though.
There was a problem hiding this comment.
Why would you want to do this? I am trying to understand the use case. And why for kernel functions and not kernel layers? If you don't want a function to be kernelized, then you can kernelize it with a mapping that does not have a layer for that function. Having two ways to do the same thing sounds really confusing? Plus it would get permanently detached.
There was a problem hiding this comment.
If you don't want a function to be kernelized, then you can kernelize it with a mapping that does not have a layer for that function.
I was missing this aspect. Thanks for explaining.
sayakpaul
left a comment
There was a problem hiding this comment.
Thanks for shipping this! My comments are mostly test-related. That doesn't have to block this PR, we can ship them in a next PR. Additionally, I think we should probably start testing deprecation warnings are properly raised.
| self._repo_id, revision=self._resolve_revision(), trust_remote_code=self._trust_remote_code | ||
| self._repo_id, | ||
| revision=self._resolve_revision(), | ||
| trust_remote_code=self._trust_remote_code, |
There was a problem hiding this comment.
(nit): Seems like a formatting-related change?
| mode=mode, | ||
| device_type=device_type, | ||
| use_fallback=use_fallback, | ||
| ) |
There was a problem hiding this comment.
The ability to detach them through a flag doesn't sound so bad, though.
| can_torch_compile = getattr(func, "can_torch_compile", False) | ||
| has_backward = getattr(func, "has_backward", True) |
There was a problem hiding this comment.
Didn't see these two flags documented.
There was a problem hiding this comment.
It's in kernel-requirements.md.
This is a set of related changes based on internal discussions:
FuncRepositoryclasses. Kernel-side functions are not a great interface, because one typically has to set attributes to permittorch.compileand backward passes. However for functions, these are detached. It also hides that the functions are converted tonn.Moduleunderneath.use_kernel_from_hubdecorator and extenduse_kernel_forward_from_hubto provide the same functionality. This makes it clearer that when decorating a function, it gets replaced by a module instance.use_kernelized_funcfrom transformers. This annotation makes it unnecessary to assign functions as member variables, which cause issues in some applications. Instead, they are registered in a module separate from (sub-)modules.kernelizeis updated to pick up these attached functions as well.