Skip to content

[SPARK-55756][SQL][TESTS] Add --DEBUG directive for golden file test framework#54554

Closed
cloud-fan wants to merge 1 commit intoapache:masterfrom
cloud-fan:golden
Closed

[SPARK-55756][SQL][TESTS] Add --DEBUG directive for golden file test framework#54554
cloud-fan wants to merge 1 commit intoapache:masterfrom
cloud-fan:golden

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Feb 28, 2026

What changes were proposed in this pull request?

Add a --DEBUG marker directive for the SQL golden file test framework (SQLQueryTestSuite). When placed on its own line before a query in an input .sql file, it enables a focused debug mode:

  • Selective execution: Only --DEBUG-marked queries and setup commands (CREATE TABLE, INSERT, SET, etc.) are executed; all other queries are skipped.
  • Full error details: Failed queries print the complete stacktrace to the console.
  • Golden comparison: Results are still compared against the golden file so you can verify correctness.
  • Safety net: The test always fails at the end with a reminder to remove --DEBUG markers before committing.
  • DataFrame access: Documentation guides users to set a breakpoint in runDebugQueries to inspect the DataFrame instance for ad-hoc plan analysis.

Example usage in an input file:

CREATE TABLE t (id INT, val INT) USING parquet;
INSERT INTO t VALUES (1, 10), (2, 20);
-- this query is skipped in debug mode
SELECT count(*) FROM t;
-- this is the query I'm debugging
--DEBUG
SELECT sum(val) OVER (ORDER BY id) FROM t;

Example console output when running the test:

=== DEBUG: Query #3 ===
SQL: SELECT sum(val) OVER (ORDER BY id) FROM t
Golden answer: matches

When the debug query fails:

=== DEBUG: Query #3 ===
SQL: SELECT sum(val) OVER (ORDER BY id) FROM t
org.apache.spark.sql.AnalysisException: [ERROR_CLASS] ...
	at org.apache.spark.sql.catalyst.analysis...
	at ...
Golden answer: matches

Why are the changes needed?

Debugging golden file test failures is currently painful:

  1. You must run all queries even if only one needs debugging.
  2. Error output is minimal (just the error class/message), with no stacktrace.
  3. There is no way to access the DataFrame instance for plan inspection.

This change addresses all three issues with a simple, zero-config marker that can be temporarily added during development.

Does this PR introduce any user-facing change?

No. This is a test infrastructure improvement only.

How was this patch tested?

  • Manually tested with --DEBUG markers on both passing and failing queries in inline-table.sql.
  • Verified backward compatibility: tests pass normally when no --DEBUG markers are present.
  • Verified debug mode output includes full stacktraces, golden answer comparison, and the safety-fail message.

Was this patch authored or co-authored using generative AI tooling?

Yes. cursor

…framework

Add a `--DEBUG` marker that can be placed before any query in a SQL test
input file to enable selective debugging. When any `--DEBUG` marker is
present, the test enters debug mode:

- Commands (CREATE TABLE, INSERT, SET, etc.) are always executed for setup.
- Only `--DEBUG`-marked non-command queries are executed; others are skipped.
- Failed queries print the full error stacktrace to the console.
- Results are still compared against the golden file.
- The test always fails at the end as a safety net to prevent accidental commits
  with `--DEBUG` markers.

This significantly improves the debugging experience for golden file tests
by allowing developers to focus on specific failing queries without running
the entire test file, while getting full error details in the console output.
@cloud-fan cloud-fan changed the title [SPARK-XXXXX][SQL][TESTS] Add --DEBUG directive for golden file test framework [SPARK-55756][SQL][TESTS] Add --DEBUG directive for golden file test framework Feb 28, 2026
@cloud-fan
Copy link
Contributor Author

Copy link
Member

@yaooqinn yaooqinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks a good supplement for the scalatest -z option

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its nice, debugging golden files is a bit painful. Having a mode to skip all non-commands except the failing query is a good idea.

I put a few questions about whether we can do some of these improvements even in non-debug mode

We may also add a suggestion in AGENTS.md to have the agent try it as well to debug failing test (or maybe just add it to the README for debugging golden files)

* - Query results are still compared against the golden file.
* - The test always fails at the end with a reminder to remove --DEBUG markers.
* To inspect the DataFrame interactively, set a breakpoint in `runDebugQueries` at the
* line where `localSparkSession.sql(sql)` is called, then evaluate the DataFrame in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to have this available in all case, not just --debug mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's less useful in non-debug mode as all queries run, so I didn't put it in the classdoc. People can find the DataFrame in runQueries similarly.

localSparkSession.sql(sql).collect()
} else if (isDebug) {
// Capture exception stacktrace if the query fails.
var exceptionTrace: Option[String] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we always do print the stack, not just in debug mode (when we hit a non-expected exception?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same reason as above, when we run all queries, this is not useful.

@yaooqinn
Copy link
Member

Merged to master, thank you @cloud-fan and all

@yaooqinn yaooqinn closed this in 730d375 Feb 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants