[SPARK-55756][SQL][TESTS] Add --DEBUG directive for golden file test framework#54554
[SPARK-55756][SQL][TESTS] Add --DEBUG directive for golden file test framework#54554cloud-fan wants to merge 1 commit intoapache:masterfrom
Conversation
…framework Add a `--DEBUG` marker that can be placed before any query in a SQL test input file to enable selective debugging. When any `--DEBUG` marker is present, the test enters debug mode: - Commands (CREATE TABLE, INSERT, SET, etc.) are always executed for setup. - Only `--DEBUG`-marked non-command queries are executed; others are skipped. - Failed queries print the full error stacktrace to the console. - Results are still compared against the golden file. - The test always fails at the end as a safety net to prevent accidental commits with `--DEBUG` markers. This significantly improves the debugging experience for golden file tests by allowing developers to focus on specific failing queries without running the entire test file, while getting full error details in the console output.
yaooqinn
left a comment
There was a problem hiding this comment.
Looks a good supplement for the scalatest -z option
There was a problem hiding this comment.
I think its nice, debugging golden files is a bit painful. Having a mode to skip all non-commands except the failing query is a good idea.
I put a few questions about whether we can do some of these improvements even in non-debug mode
We may also add a suggestion in AGENTS.md to have the agent try it as well to debug failing test (or maybe just add it to the README for debugging golden files)
| * - Query results are still compared against the golden file. | ||
| * - The test always fails at the end with a reminder to remove --DEBUG markers. | ||
| * To inspect the DataFrame interactively, set a breakpoint in `runDebugQueries` at the | ||
| * line where `localSparkSession.sql(sql)` is called, then evaluate the DataFrame in the |
There was a problem hiding this comment.
is it possible to have this available in all case, not just --debug mode?
There was a problem hiding this comment.
it's less useful in non-debug mode as all queries run, so I didn't put it in the classdoc. People can find the DataFrame in runQueries similarly.
| localSparkSession.sql(sql).collect() | ||
| } else if (isDebug) { | ||
| // Capture exception stacktrace if the query fails. | ||
| var exceptionTrace: Option[String] = None |
There was a problem hiding this comment.
can we always do print the stack, not just in debug mode (when we hit a non-expected exception?)
There was a problem hiding this comment.
same reason as above, when we run all queries, this is not useful.
|
Merged to master, thank you @cloud-fan and all |
What changes were proposed in this pull request?
Add a
--DEBUGmarker directive for the SQL golden file test framework (SQLQueryTestSuite). When placed on its own line before a query in an input.sqlfile, it enables a focused debug mode:--DEBUG-marked queries and setup commands (CREATE TABLE, INSERT, SET, etc.) are executed; all other queries are skipped.--DEBUGmarkers before committing.runDebugQueriesto inspect theDataFrameinstance for ad-hoc plan analysis.Example usage in an input file:
Example console output when running the test:
When the debug query fails:
Why are the changes needed?
Debugging golden file test failures is currently painful:
DataFrameinstance for plan inspection.This change addresses all three issues with a simple, zero-config marker that can be temporarily added during development.
Does this PR introduce any user-facing change?
No. This is a test infrastructure improvement only.
How was this patch tested?
--DEBUGmarkers on both passing and failing queries ininline-table.sql.--DEBUGmarkers are present.Was this patch authored or co-authored using generative AI tooling?
Yes. cursor