Skip to content

Kafka Connect: Add Trivy CVE scan to CI#15430

Open
rmoff wants to merge 14 commits intoapache:mainfrom
rmoff:trivy-cve-scan-kafka-connect
Open

Kafka Connect: Add Trivy CVE scan to CI#15430
rmoff wants to merge 14 commits intoapache:mainfrom
rmoff:trivy-cve-scan-kafka-connect

Conversation

@rmoff
Copy link
Copy Markdown
Contributor

@rmoff rmoff commented Feb 24, 2026

Summary

  • Adds a Trivy vulnerability scan to the Kafka Connect CI workflow
  • Runs as part of the existing test job (on one matrix entry only, to avoid redundant scans — dependency CVEs are JVM-independent) — after check completes, it builds distZip, unpacks it, and scans the bundled jars for CRITICAL/HIGH CVEs
  • On push events (main, version branches, RC tags), uploads SARIF results to GitHub's Security tab
  • On PRs, outputs scan results to CI logs for developer visibility
  • Does not fail the build — reports only, matching the approach used by other Apache projects (e.g. Superset)

Context

Discussion on dev@ mailing list: https://lists.apache.org/thread/kbf98950pzstzgon92st7mh9vrbv5yhb

Confluent Marketplace requires a Trivy scan before listing connectors. This has previously caught CVEs that needed patching (e.g. #14985). Running the scan in CI catches vulnerabilities during development and — critically — on RC tags before the release vote starts, when fixes can still be applied.

This is independent of #15212 (adding the KC artifact to the release process) and can land in either order.

Scan the built Kafka Connect distribution zip for known
vulnerabilities using Trivy. This runs alongside the existing
tests on PRs and pushes to main/version branches, and also
on release candidate tags, giving visibility into CVEs before
a release vote starts.

- Table output on all runs (visible in CI logs)
- SARIF upload to GitHub Security tab on push events

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added the INFRA label Feb 24, 2026
@rmoff rmoff marked this pull request as draft February 24, 2026 14:12
Match the convention used in spark-ci.yml for third-party actions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rmoff rmoff marked this pull request as ready for review February 24, 2026 14:36
@kevinjqliu
Copy link
Copy Markdown
Contributor

i think its better to do this in the docker image publishing step, similar to what Superset is doing

@rmoff
Copy link
Copy Markdown
Contributor Author

rmoff commented Feb 26, 2026

@kevinjqliu I'm not sure I follow. How's this relate to publishing docker images? This is to identify CVEs in the Kafka Connect connector itself. thanks.

@kevinjqliu
Copy link
Copy Markdown
Contributor

Looking at how Trivy is used in Superset, its scanning the docker image only. In this PR, we're unpacking the jars to scan. I think it makes more sense to build a kafka connect image and then use Trivy to scan the image.

Just my preference

@rymurr
Copy link
Copy Markdown
Contributor

rymurr commented Feb 27, 2026

Looking at how Trivy is used in Superset, its scanning the docker image only. In this PR, we're unpacking the jars to scan. I think it makes more sense to build a kafka connect image and then use Trivy to scan the image.

Just my preference

Not really clear why we would add the infra to build a docker image just to use trivy. Seems like a direct scan is more parsimonious

Copy link
Copy Markdown
Contributor

@rymurr rymurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I just want to see if we can avoid 2x build w/o over complicating the CI job

scan-ref: '/tmp/kafka-connect-scan'
scanners: 'vuln'
severity: 'CRITICAL,HIGH'
ignore-unfixed: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to make sure I understand: the scan build will break when there is a hihg severity bug? A minor nit would be more comments in the file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment the scan is report-only and won't fail the build.

I've also added inline comments explaining the behaviour.

Thinking about it some more, what'd be the ideal behaviour be? If a PR introduces a CVE, ought the build fail? Or maybe log a PR comment?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be ideal if the trivy scan was 'red' but didn't block the CI job.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated abfb8dc

  • A failed step with continue-on-error: true shows with an orange/amber warning icon
  • The overall job still shows as green

Address review feedback: move the vulnerability scan steps into the
existing kafka-connect-tests job (gated on JVM 21) instead of running
a separate job that duplicates checkout, setup, and compilation.

Also adds inline comments explaining the scan behaviour and explicit
exit-code: '0' to ensure the scan is report-only (the default would
fail the build on findings).
@kevinjqliu
Copy link
Copy Markdown
Contributor

Not really clear why we would add the infra to build a docker image just to use trivy. Seems like a direct scan is more parsimonious

Just looking at general patterns from the apache repos. https://grep.app/search?f.repo.pattern=apache%2F&q=uses%3A+aquasecurity%2Ftrivy-action
All the use cases I can find are using trivy for scanning images. (using image-ref: )

The only instance of scan-type: 'fs' i found has it disabled https://github.com/apache/plc4x/blob/eb41533bfab101acb87b9acdaf81c70d2e2fa286/.github/workflows/sast.yaml#L34-L47

From the docs, it seems like Filesystem scan and Container image scan are similar in that they both scan for Vulnerabilities, Misconfigurations, Secrets, and Licenses.

I think it would be helpful here if you can run the change on your fork repo and see if the fs trivy scan catches the currently open CVE for kafka connect (#15440)

@rmoff
Copy link
Copy Markdown
Contributor Author

rmoff commented Mar 2, 2026

I think it would be helpful here if you can run the change on your fork repo and see if the fs trivy scan catches the currently open CVE for kafka connect (#15440)

That's what's shown here under "Reproducing" section.

Change exit-code from 0 to 1 so the scan step fails visibly when
CRITICAL/HIGH CVEs are found, but add continue-on-error: true so
the overall job still passes.
@rmoff rmoff requested a review from rymurr March 17, 2026 09:44
Copy link
Copy Markdown
Contributor

@rymurr rymurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes themselves look good to me. I am just confused/questioning the github actions config...

My assumption is we want:

  • results to be uploaded regardless of pass/fail states of steps above
  • the job to fail if it detects errors
  • the 2nd scan to run regardless of if the other failed
    I think whats happening now is the first always looks like it fails, the second always looks like it passes and the entire job always looks like it passes.

It might be worth trying this a few times in your own fork to make sure this actaully has the edge cases we expect.

output: 'trivy-results.sarif'
severity: 'CRITICAL,HIGH'
ignore-unfixed: true
exit-code: '0'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this exit code 0 and the othr is 1? What does this mean anyways...it always exits with 0/1 regardless of state?

Seems this is missing continue on error too? Or is the SARIF scan just different?

rmoff and others added 8 commits March 20, 2026 16:42
Temporary commit for testing Trivy scan behaviour on fork.
Adds trivy-test-* to push branch triggers to test push events.
Improves inline comments explaining continue-on-error and exit-code.
Removes redundant explicit exit-code: '0' from SARIF scan.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Temporary commit — will be reverted after testing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The SARIF scan needs explicit exit-code: '0' — trivy defaults to
exiting 1 when findings exist, which failed the SARIF step in the
previous test run.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The second trivy-action invocation fails during Trivy binary
installation — the binary is already installed by the first scan.
Adding skip-setup-trivy: true avoids the reinstall conflict.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The trivy-action's setup step fails when invoked twice in the same
job because the binary is already installed at the same path.
Instead, call the trivy CLI directly for the SARIF output — it's
already on PATH from the first action invocation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
trivy-action installs to $HOME/.local/bin/trivy-bin but doesn't
add it to GITHUB_PATH, so subsequent shell steps can't find it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use one trivy-action call that outputs SARIF, then parse it with
jq for human-readable CI log output. Avoids dual-invocation binary
conflict. Updated to trivy-action v0.35.0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fs mode doesn't detect CVEs in shaded jars (e.g. jackson-core inside
parquet-jackson). rootfs mode does. This should trigger CVE findings
for scenarios #2 and apache#4.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I think this is a great addition, however I'm concerned about supply chain risk due to the recent (second) compromise of trivy's infra.

See

I see we are using aquasecurity/trivy-action@e368e328979b113139d6f9068e03accaed98a518 . Glad to see we're using the commit hash.

I believe this is still ongoing; we should wait until theres resolution to proceed with this PR.

EDIT: looks like trivy was just pulled from the list of allowed actions by asf-infra apache/infrastructure-actions#548
https://infra.apache.org/blog/trivy_security_incident.html

@lhotari
Copy link
Copy Markdown
Member

lhotari commented Mar 23, 2026

EDIT: looks like trivy was just pulled from the list of allowed actions by asf-infra apache/infrastructure-actions#548
https://infra.apache.org/blog/trivy_security_incident.html

@kevinjqliu PR to add it: apache/infrastructure-actions#573

@kevinjqliu
Copy link
Copy Markdown
Contributor

apache/infrastructure-actions#582 is merged
we can try using lhotari/sandboxed-trivy-action, which is a fork running trivy in a locked-down docker container (https://github.com/lhotari/sandboxed-trivy-action)

ASF Infra has already allowlisted it

rmoff and others added 2 commits April 7, 2026 15:53
- Replace aquasecurity/trivy-action with lhotari/sandboxed-trivy-action
  (v1.0.1, pinned to commit SHA) to address the Trivy supply chain
  compromise. The sandboxed action is ASF-allowlisted and runs Trivy
  inside a locked-down Docker container.
- Remove temporary trivy-test-* branch trigger.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous SHA (c4a7bc33...) was flagged by zizmor as an impostor
commit not found in github/codeql-action's history. Update to v4
(c10b8064...) to match the rest of the repo and use a verified SHA.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rmoff
Copy link
Copy Markdown
Contributor Author

rmoff commented Apr 7, 2026

apache/infrastructure-actions#582 is merged we can try using lhotari/sandboxed-trivy-action, which is a fork running trivy in a locked-down docker container (https://github.com/lhotari/sandboxed-trivy-action)

ASF Infra has already allowlisted it

Thanks @kevinjqliu.
I've raised a PR to add rootfs support to the action so that we can use it: lhotari/sandboxed-trivy-action#1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants