feat: add tarball overlays for source archive modification#224
feat: add tarball overlays for source archive modification#224Tonisal-byte wants to merge 2 commits into
Conversation
Add internal/utils/tarball package providing: - DetectCompression: detect archive compression from filename - Extract: decompress and extract tar archives (gzip, bzip2, xz, zstd) - RepackDeterministic: create byte-reproducible archives with pinned timestamps, zeroed owner/group, GNU format, and sorted entries - ResolveExtractRoot: find single top-level directory in extracted tree Designed for reproducible builds, matching the tar --sort=name --mtime=@0 --owner=0 --group=0 --format=gnu convention used by source modification scripts in the Azure Linux project.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds support for “tarball overlays” that modify files inside source tarballs (extract/modify/repack) and updates sources preparation to account for in-place tarball changes.
Changes:
- Introduces a
tarballutility package for compression detection, extraction, and deterministic repacking. - Adds new overlay types (
tarball-file-remove,tarball-search-replace,tarball-patch) plus grouping/application logic in source prep. - Updates
sourcesfile update flow to rehash tarballs modified by overlays and extends docs/tests accordingly.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/utils/tarball/tarball.go | Implements compression detection, extraction, and deterministic repacking. |
| internal/utils/tarball/tarball_test.go | Adds tests for compression detection, extract-root resolution, and deterministic repack. |
| internal/projectconfig/overlay.go | Adds tarball overlay types, validation, and ModifiesTarball(). |
| internal/projectconfig/overlay_test.go | Extends validation/modification classification tests for tarball overlays. |
| internal/app/azldev/core/sources/tarballoverlays.go | Implements tarball overlay grouping, extract/modify/repack pipeline, and patch application. |
| internal/app/azldev/core/sources/tarballoverlays_internal_test.go | Adds unit tests for grouping and tarball overlay operations. |
| internal/app/azldev/core/sources/sourceprep.go | Applies tarball overlays first and rehashes modified tarballs when updating sources. |
| internal/app/azldev/core/sources/sourceprep_test.go | Updates expected error text for sources parsing failures. |
| internal/app/azldev/cmds/component/preparesources.go | Refactors option wiring into helper to reduce cyclomatic complexity. |
| docs/user/reference/config/overlays.md | Documents new tarball overlays and related fields/behavior. |
| newContent := compiled.ReplaceAll(content, []byte(replacement)) | ||
| if string(newContent) != string(content) { | ||
| anyReplaced = true | ||
|
|
||
| if writeErr := os.WriteFile(path, newContent, fileperms.PublicFile); writeErr != nil { | ||
| return fmt.Errorf("writing %#q:\n%w", path, writeErr) | ||
| } |
| | Regex | `regex` | Regular expression pattern to match | `spec-search-replace`, `file-search-replace` | | ||
| | Replacement | `replacement` | Literal replacement text; capture group references like `$1` are **not** expanded. Omit or leave empty to delete matched text. | `spec-search-replace`, `file-search-replace`, `file-rename` | | ||
| | Regex | `regex` | Regular expression pattern to match | `spec-search-replace`, `file-search-replace`, `tarball-search-replace` | | ||
| | Replacement | `replacement` | Literal replacement text; capture group references like `$1` are **not** expanded. Omit or leave empty to delete matched text. | `spec-search-replace`, `file-search-replace`, `file-rename`, `tarball-search-replace` | |
| data1, _ := os.ReadFile(repackPath) | ||
| data2, _ := os.ReadFile(repackPath2) |
| case ComponentOverlayTarballPatch: | ||
| if err := requireFileBasename("tarball", c.Tarball); err != nil { | ||
| return err | ||
| } | ||
|
|
||
| if c.Source == "" { | ||
| return missingField("source") | ||
| } |
| tarReader := tar.NewReader(decompressed) | ||
|
|
||
| for { | ||
| header, readErr := tarReader.Next() |
| } | ||
|
|
||
| func extractEntry(destDir string, header *tar.Header, tarReader *tar.Reader) error { | ||
| cleanName := filepath.Clean(header.Name) |
| return fmt.Errorf("creating parent for symlink %#q:\n%w", targetPath, err) | ||
| } | ||
|
|
||
| if err := os.Symlink(header.Linkname, targetPath); err != nil { |
| } | ||
| } | ||
|
|
||
| // tarballFileRemove removes files matching a glob pattern from the extracted tree. |
There was a problem hiding this comment.
Are we supporting double star globs?
|
|
||
| // globFilesInDir finds files under root matching a glob pattern. | ||
| // Supports doublestar patterns (e.g., "**/*.md"). | ||
| func globFilesInDir(root, pattern string) ([]string, error) { |
There was a problem hiding this comment.
Do we not already have helpers to do this?
|
|
||
| These overlays modify files **inside** source tarballs. The tarball is extracted into a temporary directory, modifications are applied, and the tarball is repacked with the same compression format. Extraction and repacking are handled natively; patch application requires the `patch` command on the host. | ||
|
|
||
| > **Note:** Tarball overlays are applied before spec and file overlays, so subsequent overlays see the modified tarball. The `tarball-patch` overlay type requires the `patch` command to be installed on the host. |
There was a problem hiding this comment.
Why is this required? Would it not be simpler to handle these along with other overlays?
| | `file-remove` | Removes a file | `file` | Glob pattern for files to remove | | ||
| | `file-rename` | Renames a file within the same directory | `file`, `replacement` | Name of file to rename | | ||
|
|
||
| ### Tarball Overlays |
There was a problem hiding this comment.
Let's not use "tarball" as the term to associate with this. What if there are zip files? Or some other kind of archive?
|
|
||
| | Type | Description | Required Fields | | ||
| |------|-------------|-----------------| | ||
| | `tarball-file-remove` | Removes file(s) matching a glob pattern from inside a tarball | `tarball`, `file` | |
There was a problem hiding this comment.
Did we consider alternatively representing this as a plain file-remove overlay, but with an additional option to operate against the contents of a given archive? That would allow reuse of all the existing overlays if we refactor the logic appropriately.
What are the trade-offs between separate overlays vs. reuse existing ones?
Add three new overlay types (tarball-file-remove, tarball-search-replace, tarball-patch) that modify files inside source tarballs during source preparation. Operations are performed in pure Go on the host. Includes: - internal/utils/tarball: reusable deterministic tar extract/repack library - Overlay type registration, validation, and fingerprinting - Source prep integration with sources file hash rehashing - User documentation and TOML examples
a495c4d to
3eb0c57
Compare
Adds tarball overlay support for modifying source archives by applying file overlays during repack.
This is part 2 of 2 in the tarball overlay feature stack.