Skip to content

feat: add tarball overlays for source archive modification#224

Draft
Tonisal-byte wants to merge 2 commits into
microsoft:mainfrom
Tonisal-byte:asalinas/tarball-overlays
Draft

feat: add tarball overlays for source archive modification#224
Tonisal-byte wants to merge 2 commits into
microsoft:mainfrom
Tonisal-byte:asalinas/tarball-overlays

Conversation

@Tonisal-byte
Copy link
Copy Markdown
Contributor

Adds tarball overlay support for modifying source archives by applying file overlays during repack.

This is part 2 of 2 in the tarball overlay feature stack.

Stacked PR: This PR builds on #223. Please review and merge #223 first. Once #223 is merged, this PR's diff will shrink to only the overlay-specific changes.

Add internal/utils/tarball package providing:
- DetectCompression: detect archive compression from filename
- Extract: decompress and extract tar archives (gzip, bzip2, xz, zstd)
- RepackDeterministic: create byte-reproducible archives with pinned
  timestamps, zeroed owner/group, GNU format, and sorted entries
- ResolveExtractRoot: find single top-level directory in extracted tree

Designed for reproducible builds, matching the tar --sort=name --mtime=@0
--owner=0 --group=0 --format=gnu convention used by source modification
scripts in the Azure Linux project.
Copilot AI review requested due to automatic review settings June 3, 2026 17:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds support for “tarball overlays” that modify files inside source tarballs (extract/modify/repack) and updates sources preparation to account for in-place tarball changes.

Changes:

  • Introduces a tarball utility package for compression detection, extraction, and deterministic repacking.
  • Adds new overlay types (tarball-file-remove, tarball-search-replace, tarball-patch) plus grouping/application logic in source prep.
  • Updates sources file update flow to rehash tarballs modified by overlays and extends docs/tests accordingly.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
internal/utils/tarball/tarball.go Implements compression detection, extraction, and deterministic repacking.
internal/utils/tarball/tarball_test.go Adds tests for compression detection, extract-root resolution, and deterministic repack.
internal/projectconfig/overlay.go Adds tarball overlay types, validation, and ModifiesTarball().
internal/projectconfig/overlay_test.go Extends validation/modification classification tests for tarball overlays.
internal/app/azldev/core/sources/tarballoverlays.go Implements tarball overlay grouping, extract/modify/repack pipeline, and patch application.
internal/app/azldev/core/sources/tarballoverlays_internal_test.go Adds unit tests for grouping and tarball overlay operations.
internal/app/azldev/core/sources/sourceprep.go Applies tarball overlays first and rehashes modified tarballs when updating sources.
internal/app/azldev/core/sources/sourceprep_test.go Updates expected error text for sources parsing failures.
internal/app/azldev/cmds/component/preparesources.go Refactors option wiring into helper to reduce cyclomatic complexity.
docs/user/reference/config/overlays.md Documents new tarball overlays and related fields/behavior.

Comment thread internal/utils/tarball/tarball.go
Comment thread internal/utils/tarball/tarball.go
Comment on lines +230 to +236
newContent := compiled.ReplaceAll(content, []byte(replacement))
if string(newContent) != string(content) {
anyReplaced = true

if writeErr := os.WriteFile(path, newContent, fileperms.PublicFile); writeErr != nil {
return fmt.Errorf("writing %#q:\n%w", path, writeErr)
}
| Regex | `regex` | Regular expression pattern to match | `spec-search-replace`, `file-search-replace` |
| Replacement | `replacement` | Literal replacement text; capture group references like `$1` are **not** expanded. Omit or leave empty to delete matched text. | `spec-search-replace`, `file-search-replace`, `file-rename` |
| Regex | `regex` | Regular expression pattern to match | `spec-search-replace`, `file-search-replace`, `tarball-search-replace` |
| Replacement | `replacement` | Literal replacement text; capture group references like `$1` are **not** expanded. Omit or leave empty to delete matched text. | `spec-search-replace`, `file-search-replace`, `file-rename`, `tarball-search-replace` |
Comment thread internal/utils/tarball/tarball.go
Comment on lines +124 to +125
data1, _ := os.ReadFile(repackPath)
data2, _ := os.ReadFile(repackPath2)
Comment thread internal/app/azldev/core/sources/tarballoverlays_internal_test.go
Comment on lines +384 to +391
case ComponentOverlayTarballPatch:
if err := requireFileBasename("tarball", c.Tarball); err != nil {
return err
}

if c.Source == "" {
return missingField("source")
}
tarReader := tar.NewReader(decompressed)

for {
header, readErr := tarReader.Next()
}

func extractEntry(destDir string, header *tar.Header, tarReader *tar.Reader) error {
cleanName := filepath.Clean(header.Name)
return fmt.Errorf("creating parent for symlink %#q:\n%w", targetPath, err)
}

if err := os.Symlink(header.Linkname, targetPath); err != nil {
Comment thread internal/utils/tarball/tarball.go
Comment thread internal/utils/tarball/tarball.go
Comment thread internal/utils/tarball/tarball.go
Comment thread internal/utils/tarball/tarball.go
Comment thread internal/utils/tarball/tarball.go
}
}

// tarballFileRemove removes files matching a glob pattern from the extracted tree.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we supporting double star globs?


// globFilesInDir finds files under root matching a glob pattern.
// Supports doublestar patterns (e.g., "**/*.md").
func globFilesInDir(root, pattern string) ([]string, error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not already have helpers to do this?


These overlays modify files **inside** source tarballs. The tarball is extracted into a temporary directory, modifications are applied, and the tarball is repacked with the same compression format. Extraction and repacking are handled natively; patch application requires the `patch` command on the host.

> **Note:** Tarball overlays are applied before spec and file overlays, so subsequent overlays see the modified tarball. The `tarball-patch` overlay type requires the `patch` command to be installed on the host.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this required? Would it not be simpler to handle these along with other overlays?

| `file-remove` | Removes a file | `file` | Glob pattern for files to remove |
| `file-rename` | Renames a file within the same directory | `file`, `replacement` | Name of file to rename |

### Tarball Overlays
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not use "tarball" as the term to associate with this. What if there are zip files? Or some other kind of archive?


| Type | Description | Required Fields |
|------|-------------|-----------------|
| `tarball-file-remove` | Removes file(s) matching a glob pattern from inside a tarball | `tarball`, `file` |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we consider alternatively representing this as a plain file-remove overlay, but with an additional option to operate against the contents of a given archive? That would allow reuse of all the existing overlays if we refactor the logic appropriately.

What are the trade-offs between separate overlays vs. reuse existing ones?

Add three new overlay types (tarball-file-remove, tarball-search-replace,
tarball-patch) that modify files inside source tarballs during source
preparation. Operations are performed in pure Go on the host.

Includes:
- internal/utils/tarball: reusable deterministic tar extract/repack library
- Overlay type registration, validation, and fingerprinting
- Source prep integration with sources file hash rehashing
- User documentation and TOML examples
@Tonisal-byte Tonisal-byte force-pushed the asalinas/tarball-overlays branch from a495c4d to 3eb0c57 Compare June 3, 2026 20:12
@Tonisal-byte Tonisal-byte marked this pull request as draft June 3, 2026 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants