Skip to content

Use bazel cquery to find out the binary path instead of guessing#67

Open
apesternikov wants to merge 8 commits into
mainfrom
ap/query_bins
Open

Use bazel cquery to find out the binary path instead of guessing#67
apesternikov wants to merge 8 commits into
mainfrom
ap/query_bins

Conversation

@apesternikov
Copy link
Copy Markdown
Contributor

@apesternikov apesternikov commented Jun 3, 2026

We were using heuristics to guess a binary for a target.
It is possible to use cquery to get the exact path instead of guessing.
As a side effect of this change it does not depend on protobuf anymore so we did a cleanup

Comment thread gitops/exec/exec.go
Comment on lines +35 to +51
func ExWithTimeout(timeout time.Duration, dir, name string, arg ...string) (output string, err error) {
log.Println("executing:", name, strings.Join(arg, " "))
var cmd *exec.Cmd
if timeout > 0 {
ctx, cancel := context.WithTimeout(context.Background(), timeout)
defer cancel()
cmd = exec.CommandContext(ctx, name, arg...)
} else {
cmd = exec.Command(name, arg...)
}
if dir != "" {
cmd.Dir = dir
}
b, err := cmd.CombinedOutput()
log.Printf("%s", string(b))
return string(b), err
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ExWithTimeout, errors resulting from context timeout or cancellation are not distinguished from other errors. This can make it difficult to diagnose whether a failure was due to a timeout or another issue. Consider checking if the error is context.DeadlineExceeded or context.Canceled and returning a more descriptive error message.

Recommended solution:

if ctx.Err() != nil {
    return string(b), fmt.Errorf("command timed out or was canceled: %w", ctx.Err())
}

Comment thread gitops/exec/exec.go
log.Fatalf("ERROR: %s", err)
}
return ret
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of log.Fatalf in Mustex and MustexWithTimeout will terminate the process on error, which may not be appropriate for library code and reduces flexibility for callers. Additionally, the error message does not include the command or arguments that failed, making debugging more difficult. Consider returning the error to the caller or including more context in the fatal log message.

Recommended solution:

log.Fatalf("ERROR executing %s %v: %s", name, arg, err)

Or, if possible, return the error instead of exiting.

Comment thread gitops/git/git_test.go
Comment on lines +568 to +571
origTimeout := *Timeout
defer func() {
*Timeout = origTimeout
}()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential Data Race on Global Timeout Variable

The test manipulates the global Timeout variable directly without synchronization:

origTimeout := *Timeout
defer func() { *Timeout = origTimeout }()
*Timeout = 1 * time.Microsecond

If tests are run in parallel, this can lead to data races and inconsistent results. To mitigate this, consider isolating the timeout per test or using synchronization primitives (e.g., mutex) to protect access to the global variable. Alternatively, refactor the code to avoid global state in tests.

Comment on lines 308 to 356
}

query := strings.Join(qv, " union ")
qr := bazelQuery(query)
targetsCh := make(chan string)
paths := bazelQueryPaths(query)
pathsCh := make(chan string)
var wg sync.WaitGroup
wg.Add(*pushParallelism)
for i := 0; i < *pushParallelism; i++ {
go func() {
defer wg.Done()
for target := range targetsCh {
bin := bazel.TargetToExecutable(target)
fi, err := os.Stat(bin)
if err == nil && fi.Mode().IsRegular() {
exec.Mustex("", bin)
} else {
log.Println("target", target, "is not a file, running as a command")

args := []string{"run"}

if len(bazelFlags) > 0 {
for _, bazelFlag := range bazelFlags {
bazelFlagArgs := strings.Split(bazelFlag, " ")

args = append(args, bazelFlagArgs...)
for path := range pathsCh {
executed := false
absPath, err := filepath.Abs(path)
if err == nil {
runfilesDir := absPath + ".runfiles"
if fi, err := os.Stat(runfilesDir); err == nil && fi.IsDir() {
workspaceDir := filepath.Join(runfilesDir, "_main")
if fi, err := os.Stat(workspaceDir); err != nil || !fi.IsDir() {
workspaceDir = filepath.Join(runfilesDir, "rules_gitops")
}
if fi, err := os.Stat(workspaceDir); err == nil && fi.IsDir() {
log.Printf("Executing %s in %s", absPath, workspaceDir)
if !*dryPush {
exec.Mustex(workspaceDir, absPath)
} else {
log.Printf("Skipping execution of %s in %s (dry run)", absPath, workspaceDir)
}
executed = true
}
}

args = append(args, target)

exec.Mustex("", *bazelCmd, args...)
}
if !executed {
fi, err := os.Stat(path)
if err == nil && fi.Mode().IsRegular() {
exec.Mustex("", path)
} else {
log.Fatalf("push binary path %s is not a regular file, cannot run it", path)
}
}
}
}()
}
for _, t := range qr.Results {
targetsCh <- t.Target.Rule.GetName()
for _, p := range paths {
pathsCh <- p
}
close(targetsCh)
close(pathsCh)
wg.Wait()
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Lack of Error Handling in Parallel Push Logic

In the parallel execution block for pushing binaries, errors from goroutines are not collected or handled. The use of exec.Mustex may terminate the process on error, but if it does not, failures are silently ignored, leading to undetected push failures.

Recommendation:

  • Use a channel or an error group (e.g., errgroup.Group) to collect errors from each goroutine.
  • After all goroutines complete, check for errors and handle them appropriately, possibly aborting or retrying as needed.

Example:

var eg errgroup.Group
for _, path := range paths {
    p := path
    eg.Go(func() error {
        // ...
        if err := exec.MayFail(...); err != nil {
            return err
        }
        return nil
    })
}
if err := eg.Wait(); err != nil {
    log.Fatalf("Push failed: %v", err)
}

This ensures all errors are surfaced and handled.

Comment on lines 88 to 161
flag.StringVar(&gitopsdir, "gitopsdir", "", "do not use temporary directory for gitops, use this directory instead")
}

func bazelQuery(query string) *analysis.CqueryResult {
type GitopsTarget struct {
Target string `json:"target"`
Binary string `json:"executable"`
DeploymentBranch string `json:"deployment_branch"`
}

func cleanTarget(target string) string {
if strings.HasPrefix(target, "@@//") {
return target[2:]
}
if strings.HasPrefix(target, "@//") {
return target[1:]
}
return target
}

func bazelQueryTargets(query string) []GitopsTarget {
log.Println("Executing bazel cquery ", query)
cmd := oe.Command(*bazelCmd, "cquery", query, "--output=proto")
starlarkExpr := `json.encode(struct(deployment_branch = getattr(providers(target)['//gitops:provider.bzl%GitopsArtifactsInfo'], 'deployment_branch', '') if '//gitops:provider.bzl%GitopsArtifactsInfo' in providers(target) else '', target = str(target.label), executable = target.files_to_run.executable.path if target.files_to_run.executable else ''))`
cmd := oe.Command(*bazelCmd, "cquery", "--implicit_deps=false", query, "--output=starlark", "--starlark:expr="+starlarkExpr)
stderr, err := cmd.StderrPipe()
if err != nil {
log.Fatal(err)
}
go func() {
io.Copy(os.Stderr, stderr)
}()
buildproto, err := cmd.Output()
out, err := cmd.Output()
if err != nil {
log.Fatal(err)
}
var targets []GitopsTarget
for _, line := range strings.Split(string(out), "\n") {
line = strings.TrimSpace(line)
if line == "" {
continue
}
var gt GitopsTarget
if err := json.Unmarshal([]byte(line), &gt); err != nil {
log.Fatalf("failed to unmarshal JSON line %q: %v", line, err)
}
targets = append(targets, gt)
}
return targets
}

func bazelQueryPaths(query string) []string {
log.Println("Executing bazel cquery ", query)
cmd := oe.Command(*bazelCmd, "cquery", "--implicit_deps=false", query, "--output=starlark", `--starlark:expr=target.files.to_list()[0].path if target.files.to_list() else ''`)
stderr, err := cmd.StderrPipe()
if err != nil {
log.Fatal(err)
}
qr := &analysis.CqueryResult{}
if err := proto.Unmarshal(buildproto, qr); err != nil {
go func() {
io.Copy(os.Stderr, stderr)
}()
out, err := cmd.Output()
if err != nil {
log.Fatal(err)
}
return qr
var paths []string
for _, line := range strings.Split(string(out), "\n") {
line = strings.TrimSpace(line)
if line != "" {
paths = append(paths, line)
}
}
return paths
}

func main() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Abrupt Process Termination on Errors

Throughout the code (e.g., in bazelQueryTargets, bazelQueryPaths, and the main release train logic), errors are handled using log.Fatal or exec.Mustex, which abruptly terminate the process. This approach prevents graceful error handling, makes it difficult to report partial results, and complicates debugging in batch or CI environments.

Recommendation:

  • Refactor functions to return errors instead of calling log.Fatal or using Mustex.
  • In the main logic, aggregate and report errors, allowing the process to continue where possible or to provide a comprehensive error report at the end.

Example:

func bazelQueryTargets(query string) ([]GitopsTarget, error) {
    // ...
    if err != nil {
        return nil, err
    }
    // ...
}

This enables more robust and maintainable error handling.

Comment thread gitops/prer/prer_test.sh
Comment on lines +65 to +67
escaped_workspace=$(echo "$WORKSPACE_ROOT" | sed 's/\./\\./g')
escaped_bin=$(echo "$push_bin" | sed 's/\./\\./g')
pattern="Skipping execution of $escaped_workspace/bazel-out/[^/]+/bin/$escaped_bin in "
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fragile Regex Pattern for Push Binary Verification

The regex pattern constructed for verifying push binaries (lines 65-67) only escapes periods (.) in the workspace and binary paths. If either path contains other regex special characters (such as +, *, ?, (, ), [, ], {, }, |, ^, $), the pattern may not match as intended, leading to false negatives or positives.

Recommendation:
Use a more robust escaping mechanism for all regex special characters. For example, use sed -e 's/[]\^$.|?*+(){}[]/\\&/g' to escape all regex metacharacters:

escaped_workspace=$(echo "$WORKSPACE_ROOT" | sed -e 's/[]\\^$.|?*+(){}[]/\\&/g')
escaped_bin=$(echo "$push_bin" | sed -e 's/[]\\^$.|?*+(){}[]/\\&/g')

This will ensure the pattern matches the intended literal paths.

Comment thread gitops/prer/prer_test.sh
Comment on lines +56 to +71
for target in "${EXPECTED_TARGETS[@]}"; do
if ! grep -q "target $target" "$OUTPUT_FILE"; then
echo "ERROR: Expected gitops target '$target' not found in output" >&2
exit 1
fi
done

echo "Verifying push binaries..."
for push_bin in "${EXPECTED_PUSH_BINARIES[@]}"; do
escaped_workspace=$(echo "$WORKSPACE_ROOT" | sed 's/\./\\./g')
escaped_bin=$(echo "$push_bin" | sed 's/\./\\./g')
pattern="Skipping execution of $escaped_workspace/bazel-out/[^/]+/bin/$escaped_bin in "
if ! grep -Eq "$pattern" "$OUTPUT_FILE"; then
echo "ERROR: Expected push binary '$push_bin' was not executed/skipped in output" >&2
exit 1
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early Exit on First Failure Reduces Test Utility

Both verification loops (lines 56-61 and 64-71) exit immediately upon the first missing target or binary. This prevents the script from reporting all missing items in a single run, making debugging less efficient.

Recommendation:
Accumulate errors in a variable and report all missing targets and binaries at the end of the script, exiting with a non-zero status if any were missing. For example:

errors=0
# ... inside each loop ...
  errors=1
# ... after loops ...
if [ "$errors" -ne 0 ]; then
  exit 1
fi

This approach provides a comprehensive error report for faster troubleshooting.

Comment on lines 187 to +191
releaseTrain, bin, found := strings.Cut(rb, ":")
if !found {
log.Fatalf("resolved_binaries: invalid resolved_binary format: %s", rb)
}
releaseTrains[releaseTrain] = append(releaseTrains[releaseTrain], bin)
releaseTrains[releaseTrain] = append(releaseTrains[releaseTrain], GitopsTarget{Target: bin, Binary: bin})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential Issue: Incorrect Parsing of Resolved Binaries

The parsing logic for resolvedBinaries uses strings.Cut(rb, ":"), which only splits on the first colon. If the binary path contains additional colons (e.g., in Windows paths or container image references), this may lead to incorrect parsing and assignment.

Recommendation:

  • Use a more robust parsing method, such as splitting only on the first colon and treating the remainder as the binary path, or validating the input format more strictly.

Example:

releaseTrain, bin, found := strings.Cut(rb, ":")
if !found || releaseTrain == "" || bin == "" {
    log.Fatalf("resolved_binaries: invalid resolved_binary format: %s", rb)
}

Or use regex for stricter validation.

Comment on lines 248 to 254
msg := workdir.GetLastCommitMessage()
targetset := make(map[string]bool)
for _, t := range targets {
targetset[t] = true
targetset[t.Target] = true
}
oldtargets := commitmsg.ExtractTargets(msg)
for _, t := range oldtargets {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic Issue: Branch Recreation May Not Handle All Deleted Targets

The branch recreation logic only recreates the branch if the first missing target is found, then breaks. If multiple targets are deleted, only the first is considered, and the branch is recreated once. This may not fully synchronize the branch state with the current targets, potentially leaving stale commits or metadata.

Recommendation:

  • After recreating the branch, ensure that all deleted targets are handled and the branch state is fully synchronized. Consider re-checking the target set after recreation or performing a full reset.

Example:

for _, t := range oldtargets {
    if !targetset[t] {
        workdir.RecreateBranch(branch, *prInto)
        // Optionally, re-check or reset branch state here
        break
    }
}

Ensure that the branch reflects the current set of targets after recreation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant