Conversation
| self.cache = | ||
| Self::compute_properties(&self.input, Arc::clone(&self.input.schema())); |
There was a problem hiding this comment.
Needs to be recomputed since the output changes.
| if auto_explain { | ||
| if duration.as_millis() >= auto_explain_min_duration as u128 { | ||
| export_auto_explain(out, &auto_explain_output)?; | ||
| } | ||
| concat_batches(&inner_schema, &batches).map_err(DataFusionError::from) | ||
| } else { | ||
| Ok(out) | ||
| } |
There was a problem hiding this comment.
The auto_explain mode will return the input's batches instead of the analyze.
| let fd: &mut dyn Write = match output { | ||
| "stdout" => &mut io::stdout(), | ||
| "stderr" => &mut io::stderr(), | ||
| _ => &mut OpenOptions::new().create(true).append(true).open(output)?, |
There was a problem hiding this comment.
Does this need any kind of validation of the file location ?
Or it is left to the developer/admin to make sure it is a safe place ?
There was a problem hiding this comment.
Does this need some kind of synchronisation when a file path is used for the output ? Two or more DF sessions using the same config may try to write to the same file simultaneously.
There was a problem hiding this comment.
Does this need any kind of validation of the file location ?
Or it is left to the developer/admin to make sure it is a safe place ?
I think it's better to leave this to the user (either way, an error is returned).
There was a problem hiding this comment.
Does this need some kind of synchronisation when a file path is used for the output ? Two or more DF sessions using the same config may try to write to the same file simultaneously.
I think again the responsibility of this falls on the user. Is it common to use multiple sessions over the same config?
| # test auto_explain | ||
|
|
||
| statement ok | ||
| set datafusion.explain.auto_explain_output = 'test_files/scratch/auto_explain.txt'; |
There was a problem hiding this comment.
Does something assert the contents of this output file ?
Does something remove this file at the end ?
There was a problem hiding this comment.
I originally tried to load the file to a table as CSV, as I think it is the only feasible way to check the contents, but since the file cannot be removed the result would always change. I mainly added these sqllogictests just to check the "set ..." commands.
As for removing the file, I'm not sure it is possible. With that said, I don't think it is necessary since it's written to the sqllogictest temporary dir.
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
alamb
left a comment
There was a problem hiding this comment.
Thanks @nuno-faria and @martin-g -- sorry for the delay in reviewing this PR
Form my perspective, this is a valuable feature but having the output to stdout / stderr in the core library is not idea.
Instead, it seems to me that auto explain belongs in the client applications themselves (e.g. datafusion-cli)
I tried to explain my thinking more here
|
@alamb thanks for the review. I also get the concerns of polluting I will have to look at the Observer suggestion once I get the time. |
What about using |
I added I was also thinking of adding the query to the output, similar to Postgre's |
I think if you wanted to make it easier to implement downstream maybe we would add some sort of API / callback thing (trait object?) that could get the info for auto_explains. Then the default implementation could log with |
Sounds good. I'll try to tackle this soon. |
Which issue does this PR close?
auto_explainmode #19215.Rationale for this change
Allowing users to check the execution plans without needing to change the existing application.
The
auto_explainmode can be enabled with thedatafusion.explain.auto_explainconfig. In addition, there are two other configs:datafusion.explain.auto_explain_output: sets the output location of the plans. Supportsstdout,stderr, and a file path.datafusion.explain.auto_explain_min_duration: only outputs plans whose duration is greater than this value (similar to Postgres'auto_explain.log_min_duration).Example in
datafusion-cli:What changes are included in this PR?
AnalyzeExecoperator to support theauto_explainmode.AnalyzeExecoperator whenauto_explainis enabled.Are these changes tested?
Yes.
Are there any user-facing changes?
New feature, but it's completely optional.