use full column set on empty post-filter projection list in duckdb#6831
use full column set on empty post-filter projection list in duckdb#6831
Conversation
Signed-off-by: Mikhail Kot <mikhail@spiraldb.com>
bcd00d7 to
d764aed
Compare
Merging this PR will improve performance by 23.63%
Performance Changes
Comparing Footnotes
|
| * Example usage: | ||
| * https://github.com/duckdb/duckdb/blob/dc11eadd8f0a7c600f0034810706605ebe10d5b9/src/include/duckdb/function/table_function.hpp#L147 | ||
| */ | ||
| const idx_t *post_filter_projection_ids; |
There was a problem hiding this comment.
i really think we should keep the naming here and explain the semantics.
It nice to keep the duckdb name
| let filter_expr_str = filter_expr | ||
| .as_ref() | ||
| .map_or_else(|| "true".to_string(), |f| f.to_string()); | ||
| debug!("Global init Vortex scan SELECT {projection_expr} WHERE {filter_expr_str}"); |
There was a problem hiding this comment.
will this not alloc the str on even if debug ignores it?
| table_filter_set: Option<&TableFilterSetRef>, | ||
| column_ids: &[u64], | ||
| column_names: &[String], | ||
| projection_ids: &[u64], |
There was a problem hiding this comment.
i find this naming more confusing, column seems to make more sense here. i think
| /// | ||
| /// Returns `true` if a chunk was exported, `false` if all rows have been exported. | ||
| pub fn export(&mut self, chunk: &mut DataChunkRef) -> VortexResult<bool> { | ||
| chunk.reset(); |
There was a problem hiding this comment.
are you sure we need this?
why if we estimate card
| ctx: ExecutionCtx, | ||
| /// Columns DuckDB requested to read from file. If empty, it's a zero-column | ||
| /// projection and should be handled accordingly, see ArrayExporter::export. | ||
| fields: Vec<Box<dyn ColumnExporter>>, |
There was a problem hiding this comment.
Could Model this with an enum? It might be easier to understand?
There was a problem hiding this comment.
I've thought about this, but then we need something like NonzeroVec, and I believe it's an overkill.
| // Post filter projection ids may be empty, in which case | ||
| // you need to use projection_ids | ||
| // https://github.com/duckdb/duckdb/blob/6e211da91657a94803c465fd0ce585f4c6754b54/src/planner/operator/logical_get.cpp#L168 | ||
| let (post_filter_projection_ids, has_projection_ids) = match post_filter_projection_ids { |
There was a problem hiding this comment.
so this is a projection_id then?
requested by DuckDB. Before we used the empty list which resulted in
uninitialized read (columns requested but not populated).
Not needed right now, but must be done once we start estimating cardinalities
projection_ids -> post_filter_projection_ids to better reflect what they are