Ingest upstream vex sources#2035
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds initial support for ingesting upstream OpenVEX sources by (1) fetching OpenVEX JSON documents from a GitHub repository and (2) transforming OpenVEX statements into DevGuard VEX rules, alongside a small adjustment to CycloneDX VEX rule path-pattern creation.
Changes:
- Add OpenVEX-to-VEXRule transformation logic and event-type mapping in
VEXRuleService. - Add GitHub repository crawling + raw-file download to collect OpenVEX JSON documents in
scanService. - Introduce a normalized
VexReportOpenVEXwrapper with validation helpers.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| services/vex_rule_service.go | Adds OpenVEX parsing + status mapping; adjusts CycloneDX-derived path-pattern building |
| services/vex_rule_service_test.go | Adds unit tests for OpenVEX parsing behavior |
| services/scan_service.go | Adds GitHub repo traversal and raw download helper to fetch OpenVEX JSON files |
| services/scan_service_test.go | Adds tests for GitHub OpenVEX fetching via a mocked GitHub API |
| normalize/sbom_graph.go | Introduces VexReportOpenVEX and basic OpenVEX report validation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…d handling, time nil pointer check, http request improvements and url checks
| } | ||
|
|
||
| // Determine default branch | ||
| repository, _, err := client.Repositories.Get(ctx, owner, repo) |
There was a problem hiding this comment.
I know, that github has rather strict rate limits. What do you think about downloading the whole repo once, maybe via zip, and then parse it.
There was a problem hiding this comment.
Didn't know this was an option, but now that you mention it, makes sense. Will implement it instead of the download loop 👍
Should the zip then also be used to determine the target files and the branch or can those request be kept as they are?
repository, _, err := client.Repositories.Get(ctx, owner, repo)
//...
tree, _, err := client.Git.GetTree(
ctx,
owner,
repo,
branch,
true, // recursive
)
I think those can/should remain as they are. This would leave us as 3 requests total for each repo scan instead of the 2 + number_of_vex_documents.
There was a problem hiding this comment.
Actually I think its perfectly fine to expect the branch name as a parameter of this function. I am pretty sure you can walk over the zip file in memory. Checkout utils/zip.go - there you get a reader of a whole zip file without even saving it on the disk. Try to keep it in memory. Then you can walk over that and you dont need the git file tree at all
There was a problem hiding this comment.
Changes have been implemented, have a look.
Related to Issue: #1977