Use bazel cquery to find out the binary path instead of guessing#67
Use bazel cquery to find out the binary path instead of guessing#67apesternikov wants to merge 8 commits into
Conversation
…ut to extract gitops metadata
…ons and test code
| func ExWithTimeout(timeout time.Duration, dir, name string, arg ...string) (output string, err error) { | ||
| log.Println("executing:", name, strings.Join(arg, " ")) | ||
| var cmd *exec.Cmd | ||
| if timeout > 0 { | ||
| ctx, cancel := context.WithTimeout(context.Background(), timeout) | ||
| defer cancel() | ||
| cmd = exec.CommandContext(ctx, name, arg...) | ||
| } else { | ||
| cmd = exec.Command(name, arg...) | ||
| } | ||
| if dir != "" { | ||
| cmd.Dir = dir | ||
| } | ||
| b, err := cmd.CombinedOutput() | ||
| log.Printf("%s", string(b)) | ||
| return string(b), err | ||
| } |
There was a problem hiding this comment.
In ExWithTimeout, errors resulting from context timeout or cancellation are not distinguished from other errors. This can make it difficult to diagnose whether a failure was due to a timeout or another issue. Consider checking if the error is context.DeadlineExceeded or context.Canceled and returning a more descriptive error message.
Recommended solution:
if ctx.Err() != nil {
return string(b), fmt.Errorf("command timed out or was canceled: %w", ctx.Err())
}| log.Fatalf("ERROR: %s", err) | ||
| } | ||
| return ret | ||
| } |
There was a problem hiding this comment.
The use of log.Fatalf in Mustex and MustexWithTimeout will terminate the process on error, which may not be appropriate for library code and reduces flexibility for callers. Additionally, the error message does not include the command or arguments that failed, making debugging more difficult. Consider returning the error to the caller or including more context in the fatal log message.
Recommended solution:
log.Fatalf("ERROR executing %s %v: %s", name, arg, err)Or, if possible, return the error instead of exiting.
| origTimeout := *Timeout | ||
| defer func() { | ||
| *Timeout = origTimeout | ||
| }() |
There was a problem hiding this comment.
Potential Data Race on Global Timeout Variable
The test manipulates the global Timeout variable directly without synchronization:
origTimeout := *Timeout
defer func() { *Timeout = origTimeout }()
*Timeout = 1 * time.MicrosecondIf tests are run in parallel, this can lead to data races and inconsistent results. To mitigate this, consider isolating the timeout per test or using synchronization primitives (e.g., mutex) to protect access to the global variable. Alternatively, refactor the code to avoid global state in tests.
| } | ||
|
|
||
| query := strings.Join(qv, " union ") | ||
| qr := bazelQuery(query) | ||
| targetsCh := make(chan string) | ||
| paths := bazelQueryPaths(query) | ||
| pathsCh := make(chan string) | ||
| var wg sync.WaitGroup | ||
| wg.Add(*pushParallelism) | ||
| for i := 0; i < *pushParallelism; i++ { | ||
| go func() { | ||
| defer wg.Done() | ||
| for target := range targetsCh { | ||
| bin := bazel.TargetToExecutable(target) | ||
| fi, err := os.Stat(bin) | ||
| if err == nil && fi.Mode().IsRegular() { | ||
| exec.Mustex("", bin) | ||
| } else { | ||
| log.Println("target", target, "is not a file, running as a command") | ||
|
|
||
| args := []string{"run"} | ||
|
|
||
| if len(bazelFlags) > 0 { | ||
| for _, bazelFlag := range bazelFlags { | ||
| bazelFlagArgs := strings.Split(bazelFlag, " ") | ||
|
|
||
| args = append(args, bazelFlagArgs...) | ||
| for path := range pathsCh { | ||
| executed := false | ||
| absPath, err := filepath.Abs(path) | ||
| if err == nil { | ||
| runfilesDir := absPath + ".runfiles" | ||
| if fi, err := os.Stat(runfilesDir); err == nil && fi.IsDir() { | ||
| workspaceDir := filepath.Join(runfilesDir, "_main") | ||
| if fi, err := os.Stat(workspaceDir); err != nil || !fi.IsDir() { | ||
| workspaceDir = filepath.Join(runfilesDir, "rules_gitops") | ||
| } | ||
| if fi, err := os.Stat(workspaceDir); err == nil && fi.IsDir() { | ||
| log.Printf("Executing %s in %s", absPath, workspaceDir) | ||
| if !*dryPush { | ||
| exec.Mustex(workspaceDir, absPath) | ||
| } else { | ||
| log.Printf("Skipping execution of %s in %s (dry run)", absPath, workspaceDir) | ||
| } | ||
| executed = true | ||
| } | ||
| } | ||
|
|
||
| args = append(args, target) | ||
|
|
||
| exec.Mustex("", *bazelCmd, args...) | ||
| } | ||
| if !executed { | ||
| fi, err := os.Stat(path) | ||
| if err == nil && fi.Mode().IsRegular() { | ||
| exec.Mustex("", path) | ||
| } else { | ||
| log.Fatalf("push binary path %s is not a regular file, cannot run it", path) | ||
| } | ||
| } | ||
| } | ||
| }() | ||
| } | ||
| for _, t := range qr.Results { | ||
| targetsCh <- t.Target.Rule.GetName() | ||
| for _, p := range paths { | ||
| pathsCh <- p | ||
| } | ||
| close(targetsCh) | ||
| close(pathsCh) | ||
| wg.Wait() | ||
| } | ||
|
|
There was a problem hiding this comment.
Issue: Lack of Error Handling in Parallel Push Logic
In the parallel execution block for pushing binaries, errors from goroutines are not collected or handled. The use of exec.Mustex may terminate the process on error, but if it does not, failures are silently ignored, leading to undetected push failures.
Recommendation:
- Use a channel or an error group (e.g.,
errgroup.Group) to collect errors from each goroutine. - After all goroutines complete, check for errors and handle them appropriately, possibly aborting or retrying as needed.
Example:
var eg errgroup.Group
for _, path := range paths {
p := path
eg.Go(func() error {
// ...
if err := exec.MayFail(...); err != nil {
return err
}
return nil
})
}
if err := eg.Wait(); err != nil {
log.Fatalf("Push failed: %v", err)
}This ensures all errors are surfaced and handled.
| flag.StringVar(&gitopsdir, "gitopsdir", "", "do not use temporary directory for gitops, use this directory instead") | ||
| } | ||
|
|
||
| func bazelQuery(query string) *analysis.CqueryResult { | ||
| type GitopsTarget struct { | ||
| Target string `json:"target"` | ||
| Binary string `json:"executable"` | ||
| DeploymentBranch string `json:"deployment_branch"` | ||
| } | ||
|
|
||
| func cleanTarget(target string) string { | ||
| if strings.HasPrefix(target, "@@//") { | ||
| return target[2:] | ||
| } | ||
| if strings.HasPrefix(target, "@//") { | ||
| return target[1:] | ||
| } | ||
| return target | ||
| } | ||
|
|
||
| func bazelQueryTargets(query string) []GitopsTarget { | ||
| log.Println("Executing bazel cquery ", query) | ||
| cmd := oe.Command(*bazelCmd, "cquery", query, "--output=proto") | ||
| starlarkExpr := `json.encode(struct(deployment_branch = getattr(providers(target)['//gitops:provider.bzl%GitopsArtifactsInfo'], 'deployment_branch', '') if '//gitops:provider.bzl%GitopsArtifactsInfo' in providers(target) else '', target = str(target.label), executable = target.files_to_run.executable.path if target.files_to_run.executable else ''))` | ||
| cmd := oe.Command(*bazelCmd, "cquery", "--implicit_deps=false", query, "--output=starlark", "--starlark:expr="+starlarkExpr) | ||
| stderr, err := cmd.StderrPipe() | ||
| if err != nil { | ||
| log.Fatal(err) | ||
| } | ||
| go func() { | ||
| io.Copy(os.Stderr, stderr) | ||
| }() | ||
| buildproto, err := cmd.Output() | ||
| out, err := cmd.Output() | ||
| if err != nil { | ||
| log.Fatal(err) | ||
| } | ||
| var targets []GitopsTarget | ||
| for _, line := range strings.Split(string(out), "\n") { | ||
| line = strings.TrimSpace(line) | ||
| if line == "" { | ||
| continue | ||
| } | ||
| var gt GitopsTarget | ||
| if err := json.Unmarshal([]byte(line), >); err != nil { | ||
| log.Fatalf("failed to unmarshal JSON line %q: %v", line, err) | ||
| } | ||
| targets = append(targets, gt) | ||
| } | ||
| return targets | ||
| } | ||
|
|
||
| func bazelQueryPaths(query string) []string { | ||
| log.Println("Executing bazel cquery ", query) | ||
| cmd := oe.Command(*bazelCmd, "cquery", "--implicit_deps=false", query, "--output=starlark", `--starlark:expr=target.files.to_list()[0].path if target.files.to_list() else ''`) | ||
| stderr, err := cmd.StderrPipe() | ||
| if err != nil { | ||
| log.Fatal(err) | ||
| } | ||
| qr := &analysis.CqueryResult{} | ||
| if err := proto.Unmarshal(buildproto, qr); err != nil { | ||
| go func() { | ||
| io.Copy(os.Stderr, stderr) | ||
| }() | ||
| out, err := cmd.Output() | ||
| if err != nil { | ||
| log.Fatal(err) | ||
| } | ||
| return qr | ||
| var paths []string | ||
| for _, line := range strings.Split(string(out), "\n") { | ||
| line = strings.TrimSpace(line) | ||
| if line != "" { | ||
| paths = append(paths, line) | ||
| } | ||
| } | ||
| return paths | ||
| } | ||
|
|
||
| func main() { |
There was a problem hiding this comment.
Issue: Abrupt Process Termination on Errors
Throughout the code (e.g., in bazelQueryTargets, bazelQueryPaths, and the main release train logic), errors are handled using log.Fatal or exec.Mustex, which abruptly terminate the process. This approach prevents graceful error handling, makes it difficult to report partial results, and complicates debugging in batch or CI environments.
Recommendation:
- Refactor functions to return errors instead of calling
log.Fatalor usingMustex. - In the main logic, aggregate and report errors, allowing the process to continue where possible or to provide a comprehensive error report at the end.
Example:
func bazelQueryTargets(query string) ([]GitopsTarget, error) {
// ...
if err != nil {
return nil, err
}
// ...
}This enables more robust and maintainable error handling.
| escaped_workspace=$(echo "$WORKSPACE_ROOT" | sed 's/\./\\./g') | ||
| escaped_bin=$(echo "$push_bin" | sed 's/\./\\./g') | ||
| pattern="Skipping execution of $escaped_workspace/bazel-out/[^/]+/bin/$escaped_bin in " |
There was a problem hiding this comment.
Fragile Regex Pattern for Push Binary Verification
The regex pattern constructed for verifying push binaries (lines 65-67) only escapes periods (.) in the workspace and binary paths. If either path contains other regex special characters (such as +, *, ?, (, ), [, ], {, }, |, ^, $), the pattern may not match as intended, leading to false negatives or positives.
Recommendation:
Use a more robust escaping mechanism for all regex special characters. For example, use sed -e 's/[]\^$.|?*+(){}[]/\\&/g' to escape all regex metacharacters:
escaped_workspace=$(echo "$WORKSPACE_ROOT" | sed -e 's/[]\\^$.|?*+(){}[]/\\&/g')
escaped_bin=$(echo "$push_bin" | sed -e 's/[]\\^$.|?*+(){}[]/\\&/g')This will ensure the pattern matches the intended literal paths.
| for target in "${EXPECTED_TARGETS[@]}"; do | ||
| if ! grep -q "target $target" "$OUTPUT_FILE"; then | ||
| echo "ERROR: Expected gitops target '$target' not found in output" >&2 | ||
| exit 1 | ||
| fi | ||
| done | ||
|
|
||
| echo "Verifying push binaries..." | ||
| for push_bin in "${EXPECTED_PUSH_BINARIES[@]}"; do | ||
| escaped_workspace=$(echo "$WORKSPACE_ROOT" | sed 's/\./\\./g') | ||
| escaped_bin=$(echo "$push_bin" | sed 's/\./\\./g') | ||
| pattern="Skipping execution of $escaped_workspace/bazel-out/[^/]+/bin/$escaped_bin in " | ||
| if ! grep -Eq "$pattern" "$OUTPUT_FILE"; then | ||
| echo "ERROR: Expected push binary '$push_bin' was not executed/skipped in output" >&2 | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
Early Exit on First Failure Reduces Test Utility
Both verification loops (lines 56-61 and 64-71) exit immediately upon the first missing target or binary. This prevents the script from reporting all missing items in a single run, making debugging less efficient.
Recommendation:
Accumulate errors in a variable and report all missing targets and binaries at the end of the script, exiting with a non-zero status if any were missing. For example:
errors=0
# ... inside each loop ...
errors=1
# ... after loops ...
if [ "$errors" -ne 0 ]; then
exit 1
fiThis approach provides a comprehensive error report for faster troubleshooting.
… provider in bazel cquery
| releaseTrain, bin, found := strings.Cut(rb, ":") | ||
| if !found { | ||
| log.Fatalf("resolved_binaries: invalid resolved_binary format: %s", rb) | ||
| } | ||
| releaseTrains[releaseTrain] = append(releaseTrains[releaseTrain], bin) | ||
| releaseTrains[releaseTrain] = append(releaseTrains[releaseTrain], GitopsTarget{Target: bin, Binary: bin}) |
There was a problem hiding this comment.
Potential Issue: Incorrect Parsing of Resolved Binaries
The parsing logic for resolvedBinaries uses strings.Cut(rb, ":"), which only splits on the first colon. If the binary path contains additional colons (e.g., in Windows paths or container image references), this may lead to incorrect parsing and assignment.
Recommendation:
- Use a more robust parsing method, such as splitting only on the first colon and treating the remainder as the binary path, or validating the input format more strictly.
Example:
releaseTrain, bin, found := strings.Cut(rb, ":")
if !found || releaseTrain == "" || bin == "" {
log.Fatalf("resolved_binaries: invalid resolved_binary format: %s", rb)
}Or use regex for stricter validation.
| msg := workdir.GetLastCommitMessage() | ||
| targetset := make(map[string]bool) | ||
| for _, t := range targets { | ||
| targetset[t] = true | ||
| targetset[t.Target] = true | ||
| } | ||
| oldtargets := commitmsg.ExtractTargets(msg) | ||
| for _, t := range oldtargets { |
There was a problem hiding this comment.
Logic Issue: Branch Recreation May Not Handle All Deleted Targets
The branch recreation logic only recreates the branch if the first missing target is found, then breaks. If multiple targets are deleted, only the first is considered, and the branch is recreated once. This may not fully synchronize the branch state with the current targets, potentially leaving stale commits or metadata.
Recommendation:
- After recreating the branch, ensure that all deleted targets are handled and the branch state is fully synchronized. Consider re-checking the target set after recreation or performing a full reset.
Example:
for _, t := range oldtargets {
if !targetset[t] {
workdir.RecreateBranch(branch, *prInto)
// Optionally, re-check or reset branch state here
break
}
}Ensure that the branch reflects the current set of targets after recreation.
We were using heuristics to guess a binary for a target.
It is possible to use cquery to get the exact path instead of guessing.
As a side effect of this change it does not depend on protobuf anymore so we did a cleanup