chore: add HF_TOKEN for authentication#1202
Conversation
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS
|
I'm not certain if all of our action runners are being flagged as the same anonymous account right now. As a result, enabling the key might make things worse. I am going to convert back to draft for now. |
planetf1
left a comment
There was a problem hiding this comment.
Thanks for tackling this — adding authenticated HF access is a nice improvement to CI reliability.
Two inline comments below, plus one question I couldn't anchor to a line:
publish-release.yml: was it intentional to leave that workflow out of this change? It also calls quality.yml (line 62) but without a secrets: block, so release CI will still run with anonymous HuggingFace access. Since required: false nothing breaks, but the rate-limit benefit won't apply there. Happy to note it as a follow-up if leaving it anonymous is deliberate.
| # repositories: same-repo PR runs have access to this secret, so a write-scope | ||
| # token could be exfiltrated by a malicious workflow change. We expose it as the | ||
| # env var HF_TOKEN because that is the name huggingface_hub picks up automatically. | ||
| # Rotate via repo Settings -> Secrets and variables. |
There was a problem hiding this comment.
The security reasoning in the comment above is spot-on, but the right move is to apply what the comment describes rather than suppress zizmor. Setting HF_TOKEN at job scope means it's present in every step's environment — including the pre-commit hooks and the curl | sh Ollama install — when only two steps actually need it.
Moving it to step-level env: scopes the exposure correctly and removes the need for the ignore entirely:
# Remove the job-level env block, then add per step:
- name: Check HF_TOKEN
env:
HF_TOKEN: ${{ secrets.HF_TOKEN_READ_PUBLIC_ONLY }}
run: |
...
- name: Run Tests
id: tests
env:
HF_TOKEN: ${{ secrets.HF_TOKEN_READ_PUBLIC_ONLY }}
run: uv run -m pytest -v --junit-xml=/tmp/pytest-results.xml testThere was a problem hiding this comment.
also worth noting. Using fine-grained and limiting the access to repos we need would be a good way to make the token useless to bandits. Probably recommended... but pragmatically I'm guessing our prefered method is a simpler token and we just rotate it periodically or whenever it looks overused.
There was a problem hiding this comment.
agreed that would be good practice.
There was a problem hiding this comment.
You still need the ignore comment but I've moved it to the step level as well.
| exit 0 | ||
| fi | ||
| echo "HF_TOKEN is set; verifying with the Hugging Face Hub API..." | ||
| uv run python - <<'PY' |
There was a problem hiding this comment.
Small thing: if uv run itself fails to start (corrupt venv, transient error), the step exits non-zero under bash -eo pipefail and blocks CI before tests even run — worse than the anonymous state this step is guarding against. One || makes it truly advisory:
| uv run python - <<'PY' | |
| uv run python - <<'PY' || echo "::warning::HF_TOKEN verification step could not run." |
There was a problem hiding this comment.
Good catch. I had to modify the syntax to use an enclosure but have added it. I also added a flag to the step as well.
Downgrading from request-changes to comment — findings are suggestions/warnings, not blockers.
markstur
left a comment
There was a problem hiding this comment.
unfortunate that we need this, but looks appropriate with clearly readonly token
see inline comments
| print("::warning::huggingface_hub not installed in this env; skipping HF_TOKEN verification.") | ||
| sys.exit(0) | ||
| try: | ||
| info = HfApi().whoami(token=os.environ["HF_TOKEN"]) |
There was a problem hiding this comment.
you probably know this, but just sharing that whoami does not eat into the rate limit
| # repositories: same-repo PR runs have access to this secret, so a write-scope | ||
| # token could be exfiltrated by a malicious workflow change. We expose it as the | ||
| # env var HF_TOKEN because that is the name huggingface_hub picks up automatically. | ||
| # Rotate via repo Settings -> Secrets and variables. |
There was a problem hiding this comment.
also worth noting. Using fine-grained and limiting the access to repos we need would be a good way to make the token useless to bandits. Probably recommended... but pragmatically I'm guessing our prefered method is a simpler token and we just rotate it periodically or whenever it looks overused.
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS
jakelorocco
left a comment
There was a problem hiding this comment.
publish-release.yml: was it intentional to leave that workflow out of this change? It also calls quality.yml (line 62) but without a secrets: block, so release CI will still run with anonymous HuggingFace access. Since required: false nothing breaks, but the rate-limit benefit won't apply there. Happy to note it as a follow-up if leaving it anonymous is deliberate.
Thank you, added the secret there as well.
| # repositories: same-repo PR runs have access to this secret, so a write-scope | ||
| # token could be exfiltrated by a malicious workflow change. We expose it as the | ||
| # env var HF_TOKEN because that is the name huggingface_hub picks up automatically. | ||
| # Rotate via repo Settings -> Secrets and variables. |
There was a problem hiding this comment.
You still need the ignore comment but I've moved it to the step level as well.
| exit 0 | ||
| fi | ||
| echo "HF_TOKEN is set; verifying with the Hugging Face Hub API..." | ||
| uv run python - <<'PY' |
There was a problem hiding this comment.
Good catch. I had to modify the syntax to use an enclosure but have added it. I also added a flag to the step as well.
psschwei
left a comment
There was a problem hiding this comment.
LGTM
(I put the hold flag on so that the requested reviewers can have a chance to take a look, but feel free to remove and merge as you see fit)
planetf1
left a comment
There was a problem hiding this comment.
LGTM — all three threads addressed. Step-level scoping is correct, the fallback is in place, and is covered too.
planetf1
left a comment
There was a problem hiding this comment.
LGTM — all three threads addressed. Step-level scoping is correct, the fallback for the verification step is in place, and publish-release.yml is covered too.
6046f96
Pull Request
Issue
Fixes N/A
Description
Adds an export of a HF_TOKEN to the cicd pipeline so that we are less likely to hit API request limits. (Goes from 500 per 5 minutes with an anonymous account to 1000 per 5 minutes with a free account).
Opted against an environment since it doesn't offer any additional protections here. It will be added as a repo level secret if approved. I've specifically named it
HF_TOKEN_READ_PUBLIC_ONLYso that someone doesn't put a write token in there. It doesn't get shown anywhere and only runs onFor debugging, there's also a step added that checks if the token is valid / being used:
output:
or
This means that we can remove the HF token at any time and our tests will still run.
Testing
Attribution
Adding a new component, requirement, sampling strategy, or tool?
If your PR adds or modifies one of the types below, check the matching box. A checklist of type-specific review items will be posted as a comment.
NOTE: Please ensure you have an issue that has been acknowledged by a core contributor and routed you to open a pull request against this repository. Otherwise, please open an issue before continuing with this pull request.