Support commit ts in TiFlash#10716
Support commit ts in TiFlash#10716ti-chi-bot[bot] merged 7 commits intopingcap:feature/release-8.5-materialized-viewfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds support for a hidden commit timestamp column by introducing Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Query Request
participant Coproc as Coprocessor
participant Storage as Storage/Filter
participant Exec as Executor
Client->>Coproc: Request TableScan with ExtraCommitTSColumnID (-5)
Coproc->>Coproc: Map -5 → VersionColumnID / version column name
Coproc->>Coproc: Emit schema entry for version column
Coproc->>Coproc: DAGExpressionAnalyzer adds cast to computing-layer type if needed
Coproc->>Storage: Send schema and filters (version column represented)
Storage->>Storage: PushDownFilter / RSOperator alias -5 → VersionColumnID
Storage->>Exec: Return resolved column definitions
Exec->>Exec: PhysicalTableScan applies projection (uses sample_block)
Exec-->>Client: Return rows including commit_ts column
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@dbms/src/Flash/Planner/Plans/PhysicalTableScan.cpp`:
- Line 205: Replace the use of the member sample_block when constructing the
DAGExpressionAnalyzer with the local header from the group_builder so the
analyzer uses storage-side column types; specifically, in PhysicalTableScan.cpp
change the DAGExpressionAnalyzer creation to pass header (the
group_builder/header variable) instead of sample_block, mirroring the
DAGPipeline path that uses header and ensuring correct cast generation for the
version column; after the change, rebuild and run relevant planner/execution
tests to verify behavior.
In `@dbms/src/Storages/KVStore/Types.h`:
- Line 63: Rename the inconsistent constant ExtraCommitTSID to
ExtraCommitTSColumnID to match the existing naming convention (e.g.,
TiDBPkColumnID, VersionColumnID); update the declaration in Types.h and replace
all references/usages of ExtraCommitTSID throughout the codebase (tests,
headers, sources) to the new symbol ExtraCommitTSColumnID to avoid build breaks
and keep naming consistent.
🧹 Nitpick comments (2)
dbms/src/Flash/Planner/Plans/PhysicalTableScan.cpp (2)
175-184: Note:sample_blockvariable shadowing on line 182.Line 167 declares
auto sample_block = ...(shadows member), then line 182 declaresconst auto & sample_block = ...(shadows the line-167 local). While the shadowing is intentional here — you need the post-cast sample block to extract the new column name — this nested shadowing of the same name makes the code fragile and hard to reason about.Consider using a distinct name like
cast_sample_blockfor the inner declaration.Proposed rename
- const auto & sample_block = schema_project->getSampleBlock(); - schema_project_cols[version_col_idx].first = sample_block.getByPosition(sample_block.columns() - 1).name; + const auto & cast_sample_block = schema_project->getSampleBlock(); + schema_project_cols[version_col_idx].first = cast_sample_block.getByPosition(cast_sample_block.columns() - 1).name;
207-208: Same shadowing issue here — use a distinct name for the post-cast sample block.Same concern as in the DAGPipeline overload:
sample_blockon line 207 shadows the member variable. Using a distinct name improves clarity, especially given the bug on line 205 wheresample_block(member) was used unintentionally.Proposed rename
- const auto & sample_block = schema_actions->getSampleBlock(); - schema_project_cols[version_col_idx].first = sample_block.getByPosition(sample_block.columns() - 1).name; + const auto & cast_sample_block = schema_actions->getSampleBlock(); + schema_project_cols[version_col_idx].first = cast_sample_block.getByPosition(cast_sample_block.columns() - 1).name;
| // Convert version column to nullable when needed | ||
| if (version_col_idx != -1) | ||
| { | ||
| DAGExpressionAnalyzer analyzer(sample_block, context); |
There was a problem hiding this comment.
Bug: sample_block (member) used instead of header (current group_builder header) for DAGExpressionAnalyzer.
On line 205, DAGExpressionAnalyzer analyzer(sample_block, context) references the class member sample_block (constructed from the schema types at plan-build time). It should use the local header variable (line 194), which reflects the actual storage column types from the group builder — the same pattern as the DAGPipeline path on line 177.
The member sample_block has the TiDB-side types for the version column, but the analyzer needs the storage-side types to generate the correct cast expression.
Proposed fix
- DAGExpressionAnalyzer analyzer(sample_block, context);
+ DAGExpressionAnalyzer analyzer(header, context);🤖 Prompt for AI Agents
In `@dbms/src/Flash/Planner/Plans/PhysicalTableScan.cpp` at line 205, Replace the
use of the member sample_block when constructing the DAGExpressionAnalyzer with
the local header from the group_builder so the analyzer uses storage-side column
types; specifically, in PhysicalTableScan.cpp change the DAGExpressionAnalyzer
creation to pass header (the group_builder/header variable) instead of
sample_block, mirroring the DAGPipeline path that uses header and ensuring
correct cast generation for the version column; after the change, rebuild and
run relevant planner/execution tests to verify behavior.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@dbms/src/Flash/Coprocessor/GenSchemaAndColumn.cpp`:
- Line 21: Replace the local-quote include for the KV types with the
project-style angle-bracket path: change the include statement referencing
"Storages/KVStore/Types.h" to use the angle-bracket form with the relative path
from dbms/src (i.e., use <Storages/KVStore/Types.h>), ensuring the include in
GenSchemaAndColumn.cpp follows the project's include convention.
- Around line 126-133: genNamesAndTypes currently falls through to the default
for ExtraCommitTSColumnID causing inconsistent schema entries; add a case for
ExtraCommitTSColumnID in genNamesAndTypes that mirrors
genColumnDefinesForDisaggregatedRead by mapping to VersionColumnID and using
MutableSupport::version_column_name and MutableSupport::version_column_type
instead of the generic column_info.name and
getDataTypeByColumnInfoForComputingLayer so the produced name and type match the
disaggregated read path.
🧹 Nitpick comments (1)
dbms/src/Flash/Planner/Plans/PhysicalTableScan.cpp (1)
158-167: Localsample_blockshadows the member — consider renaming for clarity.Line 160 introduces a local
sample_blockthat shadows the class member of the same name. While functionally correct (the local correctly uses the storage-side header), shadowing can confuse future readers.Suggested rename
- auto sample_block = pipeline.firstStream()->getHeader(); - auto schema_project_cols = buildTableScanProjectionCols(tidb_table_scan.getLogicalTableID(), schema, sample_block); + auto storage_header = pipeline.firstStream()->getHeader(); + auto schema_project_cols = buildTableScanProjectionCols(tidb_table_scan.getLogicalTableID(), schema, storage_header);
| #include <TiDB/Decode/TypeMapping.h> | ||
| #include <TiDB/Schema/TiDB.h> | ||
|
|
||
| #include "Storages/KVStore/Types.h" |
There was a problem hiding this comment.
Use angle-bracket include path per project convention.
As per coding guidelines, includes should use relative paths from dbms/src with angle brackets.
Proposed fix
-#include "Storages/KVStore/Types.h"
+#include <Storages/KVStore/Types.h>🤖 Prompt for AI Agents
In `@dbms/src/Flash/Coprocessor/GenSchemaAndColumn.cpp` at line 21, Replace the
local-quote include for the KV types with the project-style angle-bracket path:
change the include statement referencing "Storages/KVStore/Types.h" to use the
angle-bracket form with the relative path from dbms/src (i.e., use
<Storages/KVStore/Types.h>), ensuring the include in GenSchemaAndColumn.cpp
follows the project's include convention.
| case ExtraCommitTSColumnID: | ||
| { | ||
| column_defines->emplace_back(DM::ColumnDefine{ | ||
| VersionColumnID, | ||
| output_name, // MutableSupport::version_column_name | ||
| MutableSupport::version_column_type}); | ||
| break; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for all callers of genNamesAndTypes and genNamesAndTypesForTableScan
rg -n 'genNamesAndTypes' dbms/src/Flash/Coprocessor/ --type=cpp -B2 -A5Repository: pingcap/tiflash
Length of output: 5903
🏁 Script executed:
#!/bin/bash
# Read the full implementation of both functions
sed -n '66,145p' dbms/src/Flash/Coprocessor/GenSchemaAndColumn.cppRepository: pingcap/tiflash
Length of output: 3252
🏁 Script executed:
#!/bin/bash
# Search for where ExtraCommitTSColumnID is used/defined
rg -n 'ExtraCommitTSColumnID' dbms/src/Flash/Coprocessor/ --type=cpp -B2 -A2Repository: pingcap/tiflash
Length of output: 2026
🏁 Script executed:
#!/bin/bash
# Search for where ExtraCommitTSColumnID gets added to the column list
rg -n 'ExtraCommitTSColumnID' dbms/src --type=cpp | head -20
echo "---"
# Check if there's any filtering in DAGStorageInterpreter before calling genNamesAndTypesForTableScan
rg -n 'genNamesAndTypesForTableScan\|ExtraCommitTSColumnID' dbms/src/Flash/Coprocessor/DAGStorageInterpreter.cpp -B5 -A5Repository: pingcap/tiflash
Length of output: 962
🏁 Script executed:
#!/bin/bash
# Check DAGStorageInterpreter.cpp context around genNamesAndTypesForTableScan calls
sed -n '415,430p' dbms/src/Flash/Coprocessor/DAGStorageInterpreter.cpp
echo "---"
sed -n '520,540p' dbms/src/Flash/Coprocessor/DAGStorageInterpreter.cpp
echo "---"
# Check if table_scan.getColumns() ever includes ExtraCommitTSColumnID by looking at where it's populated
rg -n 'getColumns\|ExtraCommitTSColumnID' dbms/src/Flash/Coprocessor/DAGExpressionAnalyzer.cpp -B3 -A3 | head -40Repository: pingcap/tiflash
Length of output: 1628
🏁 Script executed:
#!/bin/bash
# Check if ExtraCommitTSColumnID is ever added to table_scan columns
# Look at how TiDBTableScan gets its columns
rg -n 'class TiDBTableScan' dbms/src/Flash --type=cpp -A20 | head -40
echo "---"
# Check DAGExpressionAnalyzer context where ExtraCommitTSColumnID is checked
sed -n '1190,1210p' dbms/src/Flash/Coprocessor/DAGExpressionAnalyzer.cppRepository: pingcap/tiflash
Length of output: 3122
🏁 Script executed:
#!/bin/bash
# Check TiDBTableScan constructor to see how columns are populated
rg -n 'TiDBTableScan::TiDBTableScan' dbms/src/Flash/Coprocessor/ --type=cpp -A50 | head -80
echo "---"
# Check if there's any filtering of ExtraCommitTSColumnID before calling genNamesAndTypes
rg -n 'ExtraCommitTSColumnID' dbms/src/Flash/Coprocessor/DAGExpressionAnalyzer.cpp -B10 -A10Repository: pingcap/tiflash
Length of output: 6820
Add case for ExtraCommitTSColumnID in genNamesAndTypes.
genNamesAndTypes (line 66) does not handle ExtraCommitTSColumnID, whereas genColumnDefinesForDisaggregatedRead (line 126) explicitly maps it to VersionColumnID with the correct type. When ExtraCommitTSColumnID appears in the column list (which it can, as evidenced by explicit checks in DAGExpressionAnalyzer.cpp:1194 and RemoteRequest.cpp:76), it falls through to the default case in genNamesAndTypes, using the generic column_info.name and getDataTypeByColumnInfoForComputingLayer. This produces an inconsistent schema entry. Add a matching case for ExtraCommitTSColumnID to use the correct name (MutableSupport::version_column_name) and type (MutableSupport::version_column_type).
🤖 Prompt for AI Agents
In `@dbms/src/Flash/Coprocessor/GenSchemaAndColumn.cpp` around lines 126 - 133,
genNamesAndTypes currently falls through to the default for
ExtraCommitTSColumnID causing inconsistent schema entries; add a case for
ExtraCommitTSColumnID in genNamesAndTypes that mirrors
genColumnDefinesForDisaggregatedRead by mapping to VersionColumnID and using
MutableSupport::version_column_name and MutableSupport::version_column_type
instead of the generic column_info.name and
getDataTypeByColumnInfoForComputingLayer so the produced name and type match the
disaggregated read path.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@dbms/src/Storages/tests/gtest_hidden_commit_ts_column.cpp`:
- Line 33: Typo in the inline comment: change "registed" to "registered" in the
comment "// Maybe another test has already registed, ignore exception here." so
it reads "// Maybe another test has already registered, ignore exception
here."—locate this comment in gtest_hidden_commit_ts_column.cpp and update the
spelling.
| } | ||
| catch (DB::Exception &) | ||
| { | ||
| // Maybe another test has already registed, ignore exception here. |
There was a problem hiding this comment.
Typo: "registed" → "registered".
- // Maybe another test has already registed, ignore exception here.
+ // Maybe another test has already registered, ignore exception here.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Maybe another test has already registed, ignore exception here. | |
| // Maybe another test has already registered, ignore exception here. |
🤖 Prompt for AI Agents
In `@dbms/src/Storages/tests/gtest_hidden_commit_ts_column.cpp` at line 33, Typo
in the inline comment: change "registed" to "registered" in the comment "//
Maybe another test has already registed, ignore exception here." so it reads "//
Maybe another test has already registered, ignore exception here."—locate this
comment in gtest_hidden_commit_ts_column.cpp and update the spelling.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gengliqi, windtalker The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
f666956
into
pingcap:feature/release-8.5-materialized-view
What problem does this PR solve?
Issue Number: close #10715
Problem Summary:
What is changed and how it works?
Check List
Tests
Side effects
Documentation
Release note
Summary by CodeRabbit