Add support for verifying kernels#635
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
c609786 to
8fbe1b8
Compare
This change adds support for verifying kernels. Verification consists of two parts: 1. Verify that `metadata.json` is correctly signed using the detached signature in `metadata.json.sigstore`; and if so, 2. verify that the kernel file hashes are the same as the digest in `metadata.json`. The digest in `metadata.json` protects the kernel files, the detached signature in `metadata.json.sigstore` protects the metadata. For signing, we use sigstore, so that kernels can be signed using ephemeral keys. Ephemeral keys reduce the attack surface of leaked keys. In this change, only the API is added. It is not exposed at the top-level, since it is not fully stable yet. It is also not wired up in `get_kernel`, etc. yet because we still want to test code signing for some time. See `test_verify.py` for actual use of the API. Besides that we'll add a `kernels` subcommand for verifying kernels in the PR after this one.
8fbe1b8 to
c6b4532
Compare
drbh
left a comment
There was a problem hiding this comment.
looks really good! excited for this feature, just left a couple comments with small suggestions to harden the impl
sayakpaul
left a comment
There was a problem hiding this comment.
Thanks, just left some questions.
| def test_hash_variant_hashes_file_with_sha256(tmp_path): | ||
| (tmp_path / "_extension.so").write_bytes(b"hello world") | ||
|
|
||
| digest = Digest.hash_variant("sha256", tmp_path) | ||
|
|
||
| assert len(digest.files) == 1 | ||
| assert ( | ||
| digest.files["_extension.so"] == "uU0nuZNNPgilLlLX2n2r+sSE7+N6U4DukIj3rOLvzek=" | ||
| ) | ||
|
|
||
|
|
||
| def test_hash_variant_hashes_file_with_sha512(tmp_path): | ||
| (tmp_path / "_extension.so").write_bytes(b"hello world") | ||
|
|
||
| digest = Digest.hash_variant("sha512", tmp_path) | ||
|
|
||
| assert len(digest.files) == 1 | ||
| assert ( | ||
| digest.files["_extension.so"] | ||
| == "MJ7MSJwS1utMxA9QyQLytNDtd+5RGnx6m808qG1M2G+YndNbxf9JlnDaNCVbRbDP2DDoH2Bdz33FVC6TrpzXbw==" | ||
| ) |
There was a problem hiding this comment.
(nit): The two tests could be clubbed into a single test_hash_variant_hashes_file_with_algo() test with parametrize. But I will understand if we want to keep them separate. No strong opinions.
| # `str(violation)` reuses the same human-readable rendering. | ||
| rendered = {str(v) for v in exc_info.value.violations} | ||
| assert "Missing file: gone.py" in rendered | ||
| assert "Unknown file: extra.bin" in rendered |
| def __str__(self) -> str: ... | ||
|
|
||
| class DigestValidationError(Exception): | ||
| """Raised by :meth:`Digest.validate` when a digest cannot be validated against the reference.""" |
There was a problem hiding this comment.
Do we want to keep :meth: denotations? I don't think they exist in the rest of the library codebase.
There was a problem hiding this comment.
Ah yes. The stubs were generated automatically. Should be fixed now 👍 .
| from kernels.utils import select_revision_or_version | ||
| from kernels.verify import VerificationResult, verify_variant | ||
|
|
||
| TEST_POLICIES: list[policy.VerificationPolicy] = [ |
There was a problem hiding this comment.
For users / orgs outside the predefined set of policies, should users bypass this verification? What is the approach we're thinking of for trusted publishers outside HF (SGLang, NVIDIA, etc.) -- what would be the recommended way for them to sign?
There was a problem hiding this comment.
For users / orgs outside the predefined set of policies, should users bypass this verification?
Yep, similar to how they have to use trust_remote_code now.
What is the approach we're thinking of for trusted publishers outside HF (SGLang, NVIDIA, etc.) -- what would be the recommended way for them to sign?
Hooking signing up in CI is very simple:
Then we'd add these organizations' identities/GH repos to the policies to accept them. Though ideally, we'd be able to get the list from the Hub, as we do now with the trusted publishers.
Also introduce the `DigestAlgorithm` type in Python, to strongly type the algorithm.
sayakpaul
left a comment
There was a problem hiding this comment.
Excited to see this feature.
I don't think we need any documentation yet.
They are not stable between releases
Coverage report —
|
| Name | Stmts | Miss | Cover | Missing |
|---|---|---|---|---|
| src/kernels/__init__.py | 10 | 0 | 100% | |
| src/kernels/_system.py | 6 | 1 | 83% | 10 |
| src/kernels/_versions.py | 63 | 7 | 89% | 46, 49, 52-53, 56-57, 100 |
| src/kernels/backends.py | 194 | 55 | 72% | 40, 44, 48-51, 68, 90, 108, 117, 121, 125-127, 148, 170, 181, 188-191, 201, 205-225, 233, 256-276 |
| src/kernels/compat.py | 8 | 1 | 88% | 5 |
| src/kernels/deps.py | 54 | 4 | 93% | 58-59, 95, 98 |
| src/kernels/layer/__init__.py | 6 | 0 | 100% | |
| src/kernels/layer/_interval_tree.py | 103 | 4 | 96% | 23, 52, 147, 150 |
| src/kernels/layer/device.py | 48 | 14 | 71% | 42, 47-49, 91, 96-98, 101, 149, 152, 155-157 |
| src/kernels/layer/func.py | 93 | 7 | 92% | 72, 100, 154, 257, 263, 272, 290 |
| src/kernels/layer/globals.py | 5 | 0 | 100% | |
| src/kernels/layer/kernelize.py | 73 | 8 | 89% | 255, 273, 281-282, 288, 292, 308-310 |
| src/kernels/layer/layer.py | 174 | 15 | 91% | 166, 209, 215, 228, 320-321, 333, 342, 350, 361, 390, 394, 407, 460, 490 |
| src/kernels/layer/mode.py | 14 | 0 | 100% | |
| src/kernels/layer/repos.py | 130 | 34 | 74% | 27, 33, 36-41, 61-62, 68, 71-74, 88, 92, 101-102, 108, 111-114, 121-122, 128, 131-134, 141-142, 148, 151-154, 235 |
| src/kernels/lockfile.py | 71 | 46 | 35% | 37-104, 108-131 |
| src/kernels/status.py | 49 | 2 | 96% | 23, 81 |
| src/kernels/utils.py | 301 | 60 | 80% | 65, 77-81, 87-88, 218, 222, 225, 287, 295, 334-335, 373, 404, 409, 444, 625-630, 660, 663, 665, 671, 684-685, 706-718, 722-729, 737, 741-751, 755-762, 800, 804, 823, 825 |
| src/kernels/variants.py | 262 | 19 | 93% | 56, 87, 108, 138, 247-248, 289, 291, 371-378, 384-390, 421-427, 439-445, 534-536 |
| src/kernels/verify.py | 83 | 0 | 100% | |
| TOTAL | 1747 | 277 | 84% |
Updated by the Test kernels workflow on commit 98463e90084d651d9357c49e65ebc5c9c5f70d62.
This change adds support for verifying kernels. Verification consists of two parts:
metadata.jsonis correctly signed using the detached signature inmetadata.json.sigstore; and if so,metadata.json.The digest in
metadata.jsonprotects the kernel files, the detached signature inmetadata.json.sigstoreprotects the metadata. For signing, we use sigstore, so that kernels can be signed using ephemeral keys. Ephemeral keys reduce the attack surface of leaked keys.In this change, only the API is added. It is not exposed at the top-level, since it is not fully stable yet. It is also not wired up in
get_kernel, etc. yet because we still want to test code signing for some time. Seetest_verify.pyfor actual use of the API. Besides that we'll add akernelssubcommand for verifying kernels in the PR after this one.