fix: rewrite GNU testsuite harness to run upstream test scripts directly by kevinburkesegment · Pull Request #343 · uutils/sed

kevinburkesegment · 2026-03-17T19:45:59Z

The previous harness tried to extract sed commands from GNU test scripts via regex pattern matching, which produced false negatives (comparing against empty expected output) and false positives. This led to inflated test counts and unreliable pass/fail signals.

The new approach:

Provides a lightweight shim for the gnulib test framework (init.sh) with implementations of compare_, returns_, skip_, framework_failure_, and all require_* functions
Executes each .sh test script from the GNU testsuite directly, injecting our Rust sed binary via PATH
Uses a clean srcdir with symlinks to real test data files
Adds 30s timeout per test to catch infinite loops
Properly propagates exit codes (0=pass, 77=skip, 99=framework failure)

Results are now consistent with CI: 65 tests, ~12% pass rate, with clear PASS/FAIL/SKIP/timeout categorization.

kevinburkesegment · 2026-03-17T19:46:16Z

Here's the example output from a test run on my Macbook Pro

[INFO] Building Rust sed implementation...
[INFO] Using Rust sed binary: /Users/kburke/src/github.com/uutils/sed/target/release/sed
[INFO] Test working directory: /var/folders/mx/g324c_717ms4v6_1gglpyztm0000gn/T/tmp.pdnL3yZmCP
[INFO] Starting test execution...

[INFO] Running GNU testsuite shell script tests...
[FAIL] 8bit
[FAIL] 8to7
[FAIL] badenc
[FAIL] binary (timeout)
[PASS] bsd
[FAIL] bsd-wrapper
[SKIP] bug32082
[PASS] bug32271-1
[SKIP] bug32271-2
[FAIL] cmd-0r
[FAIL] cmd-l
[FAIL] cmd-R
[FAIL] colon-with-no-label
[FAIL] command-endings
[FAIL] comment-n
[FAIL] compile-errors
[FAIL] compile-tests
[FAIL] convert-number
[FAIL] dc (timeout)
[FAIL] distrib
[FAIL] eval
[FAIL] execute-tests
[FAIL] follow-symlinks (framework failure)
[FAIL] follow-symlinks-stdin
[FAIL] help
[FAIL] inplace-hold
[FAIL] in-place-hyphen
[SKIP] inplace-selinux
[FAIL] in-place-suffix-backup
[SKIP] invalid-mb-seq-UMR
[FAIL] mac-mf
[PASS] madding
[FAIL] mb-bad-delim
[SKIP] mb-charclass-non-utf8
[FAIL] mb-match-slash
[FAIL] mb-y-translate
[FAIL] missing-filename
[PASS] newjis
[SKIP] newline-dfa-bug
[FAIL] normalize-text
[FAIL] nulldata
[SKIP] obinary
[FAIL] posix-char-class
[FAIL] posix-mode-addr
[FAIL] posix-mode-bad-ref
[FAIL] posix-mode-ERE
[FAIL] posix-mode-N
[FAIL] posix-mode-s
[PASS] range-overlap
[FAIL] recursive-escape-c
[FAIL] regex-errors
[SKIP] regex-max-int
[FAIL] sandbox
[FAIL] stdin-prog
[PASS] stdin
[FAIL] subst-mb-incomplete
[FAIL] subst-options
[FAIL] subst-replacement
[FAIL] temp-file-cleanup
[SKIP] title-case
[FAIL] unbuffered
[PASS] uniq
[FAIL] utf8-ru
[FAIL] word-delim
[FAIL] xemacs
[INFO] Ran 65 shell script tests

=========================================
TEST RESULTS SUMMARY
=========================================
Total tests:   65
Passed:        7
Failed:        49
Skipped:       9
Duration:      87s
Pass rate:     12%
Result:        SOME TESTS FAILED

github-actions · 2026-03-17T19:49:48Z

GNU sed testsuite comparison:

Test results comparison:
  Current:   TOTAL: 0 / PASSED: 0 / FAILED: 0 / SKIPPED: 0
  Reference: TOTAL: 60 / PASSED: 17 / FAILED: 43 / SKIPPED: 0

Changes from main branch:
  TOTAL: -60
  PASSED: -17
  FAILED: -43

Test improvements (43):
  + 8to7_dash_e_0
  + bsd_echo_0
  + bsd_echo_1
  + bug32082_dash_e_0
  + bug32271-1_dash_e_0
  + bug32271-2_dash_e_0
  + cmd-0r_cmd_0
  + cmd-0r_file_1
  + cmd-0r_file_2
  + follow-symlinks_cmd_0
  + follow-symlinks_file_1
  + follow-symlinks_n_2
  + help_cmd_0
  + invalid-mb-seq-UMR_subst_0
  + mac-mf_triplet
  + missing-filename_subst_0
  + newline-dfa-bug_cmd_0
  + newline-dfa-bug_file_1
  + normalize-text_cmd_0
  + nulldata_subst_0
  + posix-char-class_subst_0
  + posix-mode-ERE_subst_0
  + posix-mode-ERE_subst_1
  + posix-mode-addr_n_0
  + posix-mode-addr_n_1
  + posix-mode-bad-ref_subst_0
  + posix-mode-s_file_2
  + posix-mode-s_file_3
  + posix-mode-s_subst_0
  + posix-mode-s_subst_1
  + recursive-escape-c_cmd_0
  + regex-errors_cmd_0
  + sandbox_cmd_0
  + sandbox_file_1
  + sandbox_file_2
  + sandbox_file_3
  + subst-options_cmd_0
  + subst-options_file_1
  + subst-options_file_2
  + subst-options_file_3
  + unbuffered_subst_0
  + utf8-ru_subst_0
  + word-delim_subst_0

codecov · 2026-03-17T20:06:47Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.07%. Comparing base (25efbc0) to head (fbf8f5f).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #343   +/-   ##
=======================================
  Coverage   82.07%   82.07%           
=======================================
  Files          13       13           
  Lines        5445     5445           
  Branches      293      293           
=======================================
  Hits         4469     4469           
  Misses        974      974           
  Partials        2        2

Flag	Coverage Δ
macos_latest	`82.51% <ø> (ø)`
ubuntu_latest	`82.63% <ø> (ø)`
windows_latest	`0.00% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sylvestre · 2026-03-17T20:11:28Z

small problem here :)

The previous harness tried to extract sed commands from GNU test scripts via regex pattern matching, which produced false negatives (comparing against empty expected output) and false positives. This led to inflated test counts and unreliable pass/fail signals. The new approach: - Provides a lightweight shim for the gnulib test framework (init.sh) with implementations of compare_, returns_, skip_, framework_failure_, and all require_* functions - Executes each .sh test script from the GNU testsuite directly, injecting our Rust sed binary via PATH - Uses a clean srcdir with symlinks to real test data files - Adds 30s timeout per test to catch infinite loops - Properly propagates exit codes (0=pass, 77=skip, 99=framework failure) Results are now consistent with CI: 65 tests, ~12% pass rate, with clear PASS/FAIL/SKIP/timeout categorization. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

kevinburkesegment · 2026-03-17T20:38:49Z

Sorry - give it another try

sylvestre · 2026-03-17T20:42:31Z

it is fine, don't be sorry :)

When a test hangs and the watchdog kills the shell, orphaned child processes (e.g. our sed binary in an infinite loop) can keep a pipe's write end open, causing the $() subshell capture to block indefinitely. Switch to writing test output to a temp file instead of capturing via pipe. This ensures the parent script always returns promptly after the watchdog fires, regardless of orphaned processes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

kevinburkesegment · 2026-03-17T21:54:12Z

one more try please!

sylvestre · 2026-03-17T21:56:29Z

done!

The previous approach (background process + manual watchdog) left orphaned sed processes that could block the script or cause the CI runner to kill the entire step (exit code 143). Switch to running `timeout --kill-after=5 30` in the foreground with output redirected to a file. This lets timeout properly manage the child process tree and avoids pipe-blocking issues entirely. Also detect exit code 125 (uutils timeout) in addition to 124 (GNU coreutils timeout) and 137 (SIGKILL). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

kevinburkesegment force-pushed the fix/gnu-testsuite-harness branch from 7a0ac5c to f3ed0cd Compare March 17, 2026 20:30

kevinburkesegment force-pushed the fix/gnu-testsuite-harness branch from 8fe8ac8 to fbf8f5f Compare March 17, 2026 22:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: rewrite GNU testsuite harness to run upstream test scripts directly#343

fix: rewrite GNU testsuite harness to run upstream test scripts directly#343
kevinburkesegment wants to merge 3 commits intouutils:mainfrom
kevinburkesegment:fix/gnu-testsuite-harness

kevinburkesegment commented Mar 17, 2026

Uh oh!

kevinburkesegment commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

codecov bot commented Mar 17, 2026 •

edited

Loading

Uh oh!

sylvestre commented Mar 17, 2026

Uh oh!

kevinburkesegment commented Mar 17, 2026

Uh oh!

sylvestre commented Mar 17, 2026

Uh oh!

kevinburkesegment commented Mar 17, 2026

Uh oh!

sylvestre commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kevinburkesegment commented Mar 17, 2026

Uh oh!

kevinburkesegment commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

codecov bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sylvestre commented Mar 17, 2026

Uh oh!

kevinburkesegment commented Mar 17, 2026

Uh oh!

sylvestre commented Mar 17, 2026

Uh oh!

kevinburkesegment commented Mar 17, 2026

Uh oh!

sylvestre commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Mar 17, 2026 •

edited

Loading