Skip to content

columnar: Refactor for later pipeline evolution#10907

Open
JaySon-Huang wants to merge 7 commits into
pingcap:masterfrom
JaySon-Huang:refactor_for_pipeline
Open

columnar: Refactor for later pipeline evolution#10907
JaySon-Huang wants to merge 7 commits into
pingcap:masterfrom
JaySon-Huang:refactor_for_pipeline

Conversation

@JaySon-Huang

@JaySon-Huang JaySon-Huang commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

What is changed and how it works?


Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Summary by CodeRabbit

  • New Features

    • Added next-generation columnar reader implementation with enhanced integration into query execution pipelines and disaggregated storage support.
  • Refactor

    • Reorganized columnar storage components into dedicated namespace for improved code structure.

…irectory

Move Columnar-specific types out of Flash/Coprocessor and
StorageDisaggregatedColumnar into a dedicated Storages/Columnar/ directory
for better code organization.

Changes:
- Move ColumnarScanContext from Flash/Coprocessor/ to Storages/Columnar/
- Extract RNColumnarReader* types (Plan/Work/SharedContext/ReadTask) into
  Storages/Columnar/ColumnarReader.h/.cpp
- Extract RNColumnarInputStream and RNColumnarSourceOp into
  Storages/Columnar/ColumnarStreams.h/.cpp
- Simplify StorageDisaggregatedColumnar.h/.cpp to only contain
  Storage::readThroughColumnar() and related integration glue code
- Update all include paths across Flash/Coprocessor and Flash/Statistics
- Add Storages/Columnar/ to CMake build glob
- Keep ENABLE_NEXT_GEN_COLUMNAR guards on all Columnar code paths
…mbrella header

- Rename all Columnar types to drop the 'RN' prefix:
  RNColumnarReaderMaterializeState -> ColumnarReaderMaterializeState
  RNColumnarReaderSharedContext   -> ColumnarReaderSharedContext
  RNColumnarReaderPlan            -> ColumnarReaderPlan
  RNColumnarReaderWork            -> ColumnarReaderWork
  RNColumnarReadTask              -> ColumnarReadTask
  RNColumnarInputStream           -> ColumnarInputStream
  RNColumnarSourceOp              -> ColumnarSourceOp
  getRNColumnarSourceNum          -> getColumnarSourceNum

- Delete StorageDisaggregatedColumnar.h, now redundant since
  StorageDisaggregatedColumnar.cpp already directly includes
  the Columnar/ headers
…SourceOp

Separate the pull-based and push-based Columnar execution models
into independent files for better maintainability and future pipeline
optimization:

- ColumnarInputStream.h/.cpp: pull-based IProfilingBlockInputStream model (legacy)
- ColumnarSourceOp.h/.cpp: push-based SourceOp pipeline model (future)

This allows pipeline-specific optimizations (async IO, adaptive
concurrency, backpressure) to be scoped to ColumnarSourceOp without
touching the legacy InputStream path.
- Rename ColumnarReadTask -> ColumnarReadTaskPool to clarify it is
  a pool/broker managing multiple ColumnarReaderWork items, not a
  single task
- Shorten static factory methods: buildColumnarReadTaskPool ->
  build / buildWithBackoff (class name already provides context)
- Inline getColumnarSourceNum at its single call site
- Inline 8 trivial getter methods into ColumnarReader.h
- Add class-level comments for ColumnarReadTaskPool,
  ColumnarReaderWork, and ColumnarReaderSharedContext
- Add member comments for reader_count, source_num,
  pending_reader_works
- Rename local variable read_columnar_tasks -> columnar_task_pools
  to match the new type name
Move the four behavioral descriptions (backoff/retry, bucket-level
parallelism, work-stealing, prefetching) from the ColumnarReadTaskPool
class comment to the specific functions that implement each behavior:

- buildWithBackoff: backoff/retry on transient errors
- build: bucket-level parallelism
- tryAcquireReaderWork: work-stealing from the front of the queue
- prefetchPendingWork: async prefetch of next work item
Signed-off-by: JaySon-Huang <tshent@qq.com>
After rebasing the refactoring onto upstream/master, apply the
makeLockForDisaggResolve change (from pingcap#10900) to createColumnarReader()
which was moved from StorageDisaggregatedColumnar.cpp to
ColumnarReader.cpp during the refactoring.
@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. labels Jun 22, 2026
@ti-chi-bot

ti-chi-bot Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign xuhuaiyu for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jun 22, 2026
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds next-gen columnar reader infrastructure (ColumnarReader, ColumnarInputStream, ColumnarSourceOp) under Storages/Columnar, all guarded by ENABLE_NEXT_GEN_COLUMNAR. Rewires StorageDisaggregatedColumnar to use the new ColumnarReadTaskPool and ColumnarSourceOp instead of the previous RNColumnar* classes. Relocates ColumnarScanContext headers from Flash/Coprocessor to Storages/Columnar.

Changes

Next-gen columnar reader pipeline

Layer / File(s) Summary
ColumnarScanContext header relocation
dbms/CMakeLists.txt, dbms/src/Flash/Coprocessor/DAGContext.h, dbms/src/Flash/Coprocessor/ExecutionSummary.*, dbms/src/Flash/Statistics/Execution*, dbms/src/Flash/Statistics/TableScanImpl.cpp
Adds src/Storages/Columnar to the dbms CMake target and replaces all Flash/Coprocessor/ColumnarScanContext*.h includes with the new Storages/Columnar paths across coprocessor and statistics files.
ColumnarReader data contracts
dbms/src/Storages/Columnar/ColumnarReader.h
Declares ColumnarReaderMaterializeState, ColumnarReaderSharedContext (with thread-safe StartTsClearRegistry RAII), ColumnarReaderPlan, ColumnarReaderWork, ColumnarReadTaskPool interface, and the createColumnarReader free function.
ColumnarReader planning and task-pool execution
dbms/src/Storages/Columnar/ColumnarReader.cpp
Implements bucket-boundary range splitting, region plan construction with epoch validation, shared context serialization (table scan PB, filters, UTC normalization), FFI-backed createColumnarReader with structured error/cache handling, ColumnarReadTaskPool retry/backoff state machine, async prefetch scheduling, and the main build() planning logic including bucket parallelism.
ColumnarInputStream block streaming
dbms/src/Storages/Columnar/ColumnarInputStream.h, dbms/src/Storages/Columnar/ColumnarInputStream.cpp
Declares and implements ColumnarInputStream (IProfilingBlockInputStream) with reader lifecycle helpers, a main readImpl loop that deserializes Rust column buffers into Blocks, a stats-logging destructor, and scan context merging.
ColumnarSourceOp pipeline operator
dbms/src/Storages/Columnar/ColumnarSourceOp.h, dbms/src/Storages/Columnar/ColumnarSourceOp.cpp
Declares and implements ColumnarSourceOp (SourceOp) with readImpl/awaitImpl/executeIOImpl state machine that lazily acquires input streams from the task pool and forwards blocks downstream, plus prefix/suffix lifecycle hooks for throughput logging.
StorageDisaggregatedColumnar rewiring
dbms/src/Storages/StorageDisaggregatedColumnar.cpp
Replaces RNColumnarReadTask/RNColumnarSourceOp usage in both readThroughColumnar overloads with ColumnarReadTaskPool::buildWithBackoff and ColumnarSourceOp::create; removes self-include of StorageDisaggregatedColumnar.h.

Sequence Diagram(s)

sequenceDiagram
  participant StorageDisaggregated
  participant ColumnarReadTaskPool
  participant ColumnarReaderWork
  participant DisaggregatedProxy

  rect rgba(70, 130, 180, 0.5)
    Note over StorageDisaggregated,DisaggregatedProxy: Planning phase
    StorageDisaggregated->>ColumnarReadTaskPool: buildWithBackoff(context, table_ranges, ...)
    ColumnarReadTaskPool->>ColumnarReadTaskPool: build() — region split + bucket parallelism
    ColumnarReadTaskPool->>ColumnarReaderWork: enqueue per-region/bucket work items
    ColumnarReadTaskPool->>ColumnarReadTaskPool: prefetchPendingWork()
    ColumnarReadTaskPool->>DisaggregatedProxy: createColumnarReader FFI (serialized region/filter/table)
    DisaggregatedProxy-->>ColumnarReaderWork: ColumnarReaderPtr (Ready) or error (Failed)
  end

  rect rgba(60, 179, 113, 0.5)
    Note over StorageDisaggregated,DisaggregatedProxy: Execution phase
    StorageDisaggregated->>ColumnarReadTaskPool: getInputStreams() / ColumnarSourceOp::create()
    loop each pipeline tick
      ColumnarReadTaskPool->>ColumnarReaderWork: tryAcquireReaderWork()
      ColumnarReaderWork-->>ColumnarReadTaskPool: ColumnarReaderPtr
      ColumnarReadTaskPool->>DisaggregatedProxy: read row batch via proxy
      DisaggregatedProxy-->>ColumnarReadTaskPool: column buffers
      ColumnarReadTaskPool-->>StorageDisaggregated: Block (deserialized columns)
    end
  end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • pingcap/tiflash#10842: Directly related — introduced ENABLE_NEXT_GEN_COLUMNAR feature wiring in StorageDisaggregatedColumnar and proxy-columnar read-path additions that this PR replaces with ColumnarReadTaskPool/ColumnarSourceOp.
  • pingcap/tiflash#10871: This PR calls fn_get_region_bucket_keys and fn_clear_shared_snap_access_by_start_ts FFI hooks whose Rust/FFI implementations were added in that PR.
  • pingcap/tiflash#10905: Both PRs share ColumnarScanStats/fn_columnar_scan_stats — that PR added the FFI callback and struct while this PR wires the results into columnar_scan_context_map during stream execution.

Suggested labels

size/XXL, approved, lgtm

Suggested reviewers

  • CalvinNeo
  • JinheLin

Poem

🐇 Hoppity-hop through the columnar rows,
New readers and streams, how the pipeline grows!
From Rust FFI buffers to Blocks we compile,
Epoch checks, backoff, bucket splits in style.
The RN-prefixed past is now gently replaced —
Next-gen columnar lands, with no data misplaced! 🎉

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is almost entirely a blank template with no substantive details filled in, leaving the template placeholders empty or with minimal content. Fill in the actual problem statement, describe the refactoring changes made, select applicable checkboxes, and provide meaningful commit message and release note content instead of leaving template blanks.
Docstring Coverage ⚠️ Warning Docstring coverage is 4.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'columnar: Refactor for later pipeline evolution' accurately reflects the main objective of the PR, which is a systematic reorganization of columnar storage types and code structure.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot

ti-chi-bot Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

📖 For more info, you can check the "Contribute Code" section in the development guide.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
dbms/src/Storages/StorageDisaggregatedColumnar.cpp (1)

70-1642: 🧹 Nitpick | 🔵 Trivial

Consider removing the old RN implementation code in a follow-up refactoring.*

The old RN* classes and helper functions (lines 70-1642) are dead code. External verification confirms no usages of RNColumnarReadTask, RNColumnarSourceOp, RNColumnarInputStream, or RNColumnarReaderSharedContext exist outside this file. These ~1,500 lines were superseded by the migration to ColumnarReadTaskPool and ColumnarSourceOp APIs. Removing this dead code would improve maintainability.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 70 - 1642,
Remove the dead code blocks containing the old RN* implementation that has been
superseded by ColumnarReadTaskPool and ColumnarSourceOp APIs. Delete the entire
RNColumnarReaderSharedContext struct definition (including the nested
StartTsClearRegistry class and all member functions), all helper functions that
build and manage RN reader plans (buildColumnarReaderSharedContext,
buildRegionReaderPlansFromPhysicalTableRanges,
buildReaderPlansFromRegionReaderPlans, createColumnarReader, and related
functions), and the complete implementations of RNColumnarReaderWork destructor,
RNColumnarReadTask class with all its methods, RNColumnarInputStream class with
all its methods, and RNColumnarSourceOp class with all its methods. Before
deletion, verify no other code in the repository references these removed
classes and functions by searching for RNColumnarReadTask,
RNColumnarInputStream, RNColumnarSourceOp, RNColumnarReaderSharedContext, and
related identifiers.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dbms/src/Flash/Statistics/TableScanImpl.cpp`:
- Line 20: In the file dbms/src/Storages/StorageDisaggregatedColumnar.cpp at
line 27, replace the stale include path from the old header location
<Flash/Coprocessor/ColumnarScanContext.h> with the new correct path
<Storages/Columnar/ColumnarScanContext.h>. This aligns with the updated include
path being used in other files like TableScanImpl.cpp and resolves the
compilation failure due to the deleted header.

In `@dbms/src/Storages/Columnar/ColumnarReader.cpp`:
- Around line 600-613: The ParseFromString call on lock_info is not checking for
parsing failures when deserializing the error message data. In the LockedError
handling block within the ColumnarReader code, verify that
lock_info.ParseFromString(error_msg) returns successfully before proceeding with
lock resolution. If parsing fails, log an appropriate error and throw an
exception or handle the malformed data case to prevent using an invalid
lock_info object in the subsequent resolveLocks call.
- Around line 540-599: The code calls region_error.ParseFromString(error_msg)
but does not check its return value before accessing fields like
has_epoch_not_match() and current_regions(), which causes undefined behavior if
the protobuf data is malformed. Capture the boolean return value of the
ParseFromString call in the RegionError parsing section and add a check to
verify the parse was successful before proceeding to access any fields on the
region_error object. If parsing fails, log an appropriate error and throw a
RegionException instead of continuing with undefined state.

In `@dbms/src/Storages/Columnar/ColumnarReader.h`:
- Around line 136-143: The catch block in the
clear_shared_snap_access_by_start_ts function call is using a basic LOG_WARNING
call instead of the recommended tryLogCurrentException function to handle
exceptions. Replace the LOG_WARNING statement with a tryLogCurrentException call
that includes a descriptive context message about the failure to clear the
shared snapshot access cache. This will ensure proper exception details are
captured for debugging according to the coding guidelines.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 632-642: The method ColumnarReadTaskPool::buildWithBackoff() is
being called but the required header file is missing from the includes section.
Add the include directive for Storages/Columnar/ColumnarReader.h to the includes
section of StorageDisaggregatedColumnar.cpp (around lines 15-61) to resolve the
compilation error for the ColumnarReadTaskPool class.

---

Nitpick comments:
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 70-1642: Remove the dead code blocks containing the old RN*
implementation that has been superseded by ColumnarReadTaskPool and
ColumnarSourceOp APIs. Delete the entire RNColumnarReaderSharedContext struct
definition (including the nested StartTsClearRegistry class and all member
functions), all helper functions that build and manage RN reader plans
(buildColumnarReaderSharedContext,
buildRegionReaderPlansFromPhysicalTableRanges,
buildReaderPlansFromRegionReaderPlans, createColumnarReader, and related
functions), and the complete implementations of RNColumnarReaderWork destructor,
RNColumnarReadTask class with all its methods, RNColumnarInputStream class with
all its methods, and RNColumnarSourceOp class with all its methods. Before
deletion, verify no other code in the repository references these removed
classes and functions by searching for RNColumnarReadTask,
RNColumnarInputStream, RNColumnarSourceOp, RNColumnarReaderSharedContext, and
related identifiers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e16d20b2-dd50-4133-b713-3be6abb7e89d

📥 Commits

Reviewing files that changed from the base of the PR and between cc5a473 and 6798c2f.

📒 Files selected for processing (17)
  • dbms/CMakeLists.txt
  • dbms/src/Flash/Coprocessor/DAGContext.h
  • dbms/src/Flash/Coprocessor/ExecutionSummary.cpp
  • dbms/src/Flash/Coprocessor/ExecutionSummary.h
  • dbms/src/Flash/Statistics/ExecutionSummaryHelper.cpp
  • dbms/src/Flash/Statistics/ExecutorStatisticsCollector.cpp
  • dbms/src/Flash/Statistics/TableScanImpl.cpp
  • dbms/src/Storages/Columnar/ColumnarInputStream.cpp
  • dbms/src/Storages/Columnar/ColumnarInputStream.h
  • dbms/src/Storages/Columnar/ColumnarReader.cpp
  • dbms/src/Storages/Columnar/ColumnarReader.h
  • dbms/src/Storages/Columnar/ColumnarScanContext.h
  • dbms/src/Storages/Columnar/ColumnarScanContext_fwd.h
  • dbms/src/Storages/Columnar/ColumnarSourceOp.cpp
  • dbms/src/Storages/Columnar/ColumnarSourceOp.h
  • dbms/src/Storages/StorageDisaggregatedColumnar.cpp
  • dbms/src/Storages/StorageDisaggregatedColumnar.h
💤 Files with no reviewable changes (1)
  • dbms/src/Storages/StorageDisaggregatedColumnar.h

#include <Flash/Statistics/ConnectionProfileInfo.h>
#include <Flash/Statistics/TableScanImpl.h>
#include <Interpreters/Join.h>
#include <Storages/Columnar/ColumnarScanContext.h>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Check whether old/new ColumnarScanContext headers exist =="
fd -i 'ColumnarScanContext(_fwd)?\.h$' dbms/src

echo
echo "== Find remaining includes of old path =="
rg -nP '`#include`\s*<Flash/Coprocessor/ColumnarScanContext(_fwd)?\.h>' dbms/src

echo
echo "== Find includes of new path =="
rg -nP '`#include`\s*<Storages/Columnar/ColumnarScanContext(_fwd)?\.h>' dbms/src

Repository: pingcap/tiflash

Length of output: 1230


Fix stale include in dbms/src/Storages/StorageDisaggregatedColumnar.cpp.

The old header <Flash/Coprocessor/ColumnarScanContext.h> no longer exists. Update line 27 of StorageDisaggregatedColumnar.cpp to use <Storages/Columnar/ColumnarScanContext.h> to avoid compilation failure. The file under review correctly uses the new path; one downstream file still references the deleted header and must be corrected.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Flash/Statistics/TableScanImpl.cpp` at line 20, In the file
dbms/src/Storages/StorageDisaggregatedColumnar.cpp at line 27, replace the stale
include path from the old header location
<Flash/Coprocessor/ColumnarScanContext.h> with the new correct path
<Storages/Columnar/ColumnarScanContext.h>. This aligns with the updated include
path being used in other files like TableScanImpl.cpp and resolves the
compilation failure due to the deleted header.

Comment on lines +540 to +599
if (columnar_reader.error_type == ColumnarReaderErrorType::RegionError)
{
auto error_msg = String(columnar_reader.error.buff.data, columnar_reader.error.buff.len);
errorpb::Error region_error;
region_error.ParseFromString(error_msg);
auto region_ver_id
= pingcap::kv::RegionVerID(reader_plan.region_id, reader_plan.region_conf_ver, reader_plan.region_ver);
// Refresh region cache and throw an exception for retrying.
if (region_error.has_epoch_not_match())
{
RegionException::UnavailableRegions unavailable_regions;
String region_id_ver; // region_id:region_ver:conf_ver
for (const auto & region : region_error.epoch_not_match().current_regions())
{
unavailable_regions.insert(region.id());
region_id_ver = std::to_string(region.id()) + ":" + std::to_string(reader_plan.region_ver) + ":"
+ std::to_string(region.region_epoch().conf_ver());
}
auto guard = std::lock_guard(*shared_context.output_lock);
cluster->region_cache->dropRegion(region_ver_id);
LOG_WARNING(
log,
"create columnar reader failed region_id={}, epoch not match {}",
std::to_string(reader_plan.region_id),
region_ver_id.toString());
throw RegionException(
std::move(unavailable_regions),
RegionException::RegionReadStatus::EPOCH_NOT_MATCH,
region_id_ver.c_str());
}
else
{
RegionException::UnavailableRegions unavailable_regions;
auto err_region_id = 0;
if (region_error.has_region_not_found())
{
err_region_id = region_error.region_not_found().region_id();
unavailable_regions.insert(err_region_id);
LOG_WARNING(
log,
"create columnar reader failed region_id={}, region not found {}",
std::to_string(reader_plan.region_id),
std::to_string(err_region_id));
}
else
{
LOG_WARNING(
log,
"create columnar reader failed region_id={}, {}",
std::to_string(reader_plan.region_id),
region_error.ShortDebugString());
}
auto guard = std::lock_guard(*shared_context.output_lock);
cluster->region_cache->dropRegion(region_ver_id);
throw RegionException(
std::move(unavailable_regions),
RegionException::RegionReadStatus::NOT_FOUND,
std::to_string(reader_plan.region_id).c_str());
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Check ParseFromString return value to avoid undefined behavior on malformed data.

If the FFI layer returns malformed protobuf data, ParseFromString returns false and leaves the message object in an undefined state. Accessing fields like has_epoch_not_match() or current_regions() would then produce undefined behavior.

Proposed fix
     if (columnar_reader.error_type == ColumnarReaderErrorType::RegionError)
     {
         auto error_msg = String(columnar_reader.error.buff.data, columnar_reader.error.buff.len);
         errorpb::Error region_error;
-        region_error.ParseFromString(error_msg);
+        if (!region_error.ParseFromString(error_msg))
+        {
+            LOG_WARNING(log, "failed to parse region error protobuf, region_id={}", reader_plan.region_id);
+            throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "failed to parse region error for region {}", reader_plan.region_id);
+        }
         auto region_ver_id
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/Columnar/ColumnarReader.cpp` around lines 540 - 599, The
code calls region_error.ParseFromString(error_msg) but does not check its return
value before accessing fields like has_epoch_not_match() and current_regions(),
which causes undefined behavior if the protobuf data is malformed. Capture the
boolean return value of the ParseFromString call in the RegionError parsing
section and add a check to verify the parse was successful before proceeding to
access any fields on the region_error object. If parsing fails, log an
appropriate error and throw a RegionException instead of continuing with
undefined state.

Comment on lines +600 to +613
else if (columnar_reader.error_type == ColumnarReaderErrorType::LockedError)
{
auto error_msg = String(columnar_reader.error.buff.data, columnar_reader.error.buff.len);
kvrpcpb::LockInfo lock_info;
lock_info.ParseFromString(error_msg);
// Try to resolve locks.
pingcap::kv::Backoffer bo(pingcap::kv::copNextMaxBackoff);
std::vector<uint64_t> pushed;
std::vector<pingcap::kv::LockPtr> locks{makeLockForDisaggResolve(lock_info)};
auto guard = std::lock_guard(*shared_context.output_lock);
auto before_expired = cluster->lock_resolver->resolveLocks(bo, shared_context.start_ts, locks, pushed);
LOG_WARNING(log, "Finished resolve locks, before_expired={}", before_expired);
throw Exception("lock error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Check ParseFromString return value for lock info.

Same issue as the region error case - parsing could fail on malformed data.

Proposed fix
     else if (columnar_reader.error_type == ColumnarReaderErrorType::LockedError)
     {
         auto error_msg = String(columnar_reader.error.buff.data, columnar_reader.error.buff.len);
         kvrpcpb::LockInfo lock_info;
-        lock_info.ParseFromString(error_msg);
+        if (!lock_info.ParseFromString(error_msg))
+        {
+            LOG_WARNING(log, "failed to parse lock info protobuf, region_id={}", reader_plan.region_id);
+            throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "failed to parse lock info for region {}", reader_plan.region_id);
+        }
         // Try to resolve locks.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/Columnar/ColumnarReader.cpp` around lines 600 - 613, The
ParseFromString call on lock_info is not checking for parsing failures when
deserializing the error message data. In the LockedError handling block within
the ColumnarReader code, verify that lock_info.ParseFromString(error_msg)
returns successfully before proceeding with lock resolution. If parsing fails,
log an appropriate error and throw an exception or handle the malformed data
case to prevent using an invalid lock_info object in the subsequent resolveLocks
call.

Comment on lines +136 to +143
try
{
clear_shared_snap_access_by_start_ts(start_ts, proxy_ptr);
}
catch (...)
{
LOG_WARNING(log, "clear shared snapaccess cache failed, start_ts={}", start_ts);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use tryLogCurrentException in catch block per coding guidelines.

The broad catch (...) block should use tryLogCurrentException to capture exception details for debugging, rather than a simple log message.

Proposed fix
         try
         {
             clear_shared_snap_access_by_start_ts(start_ts, proxy_ptr);
         }
         catch (...)
         {
-            LOG_WARNING(log, "clear shared snapaccess cache failed, start_ts={}", start_ts);
+            tryLogCurrentException(log, fmt::format("clear shared snapaccess cache failed, start_ts={}", start_ts));
         }

Based on coding guidelines: "Use tryLogCurrentException(log, "context") in broad catch (...) paths to avoid duplicated exception-formatting code"

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/Columnar/ColumnarReader.h` around lines 136 - 143, The
catch block in the clear_shared_snap_access_by_start_ts function call is using a
basic LOG_WARNING call instead of the recommended tryLogCurrentException
function to handle exceptions. Replace the LOG_WARNING statement with a
tryLogCurrentException call that includes a descriptive context message about
the failure to clear the shared snapshot access cache. This will ensure proper
exception details are captured for debugging according to the coding guidelines.

Source: Coding guidelines

Comment on lines +632 to +642
auto columnar_task_pools = ColumnarReadTaskPool::buildWithBackoff(
log,
context,
start_ts,
table_scan,
filter_conditions,
remote_table_ranges,
num_streams);
for (auto & task : read_columnar_tasks)
for (auto & task_pool : columnar_task_pools)
{
auto streams = task->getInputStreams();
auto streams = task_pool->getInputStreams();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify that the new columnar headers are included

# Check for ColumnarReader.h and ColumnarSourceOp.h includes
rg -n '`#include`.*Columnar(Reader|SourceOp|InputStream)\.h' dbms/src/Storages/StorageDisaggregatedColumnar.cpp

# Also check what headers are currently included
echo "=== All includes in the file ==="
rg -n '^`#include`' dbms/src/Storages/StorageDisaggregatedColumnar.cpp | head -20

Repository: pingcap/tiflash

Length of output: 1002


🏁 Script executed:

#!/bin/bash
# Get ALL includes in the file
echo "=== COMPLETE INCLUDE LIST ==="
rg -n '^`#include`' dbms/src/Storages/StorageDisaggregatedColumnar.cpp

# Find where ColumnarReadTaskPool is defined
echo ""
echo "=== LOCATING ColumnarReadTaskPool DEFINITION ==="
rg -r '$MATCH' 'class ColumnarReadTaskPool' --type cpp --type h | head -5

# Search for ColumnarReadTaskPool in header files
echo ""
echo "=== FILES DECLARING ColumnarReadTaskPool ==="
fd --type f '\.(h|hpp)$' | xargs rg -l 'class ColumnarReadTaskPool|ColumnarReadTaskPool\s*::'

Repository: pingcap/tiflash

Length of output: 2216


Add missing header for ColumnarReadTaskPool.

The code uses ColumnarReadTaskPool::buildWithBackoff() at line 632, but the required header #include <Storages/Columnar/ColumnarReader.h> is not present in the includes section (lines 15-61). Add this include to enable compilation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 632 - 642,
The method ColumnarReadTaskPool::buildWithBackoff() is being called but the
required header file is missing from the includes section. Add the include
directive for Storages/Columnar/ColumnarReader.h to the includes section of
StorageDisaggregatedColumnar.cpp (around lines 15-61) to resolve the compilation
error for the ColumnarReadTaskPool class.

@ti-chi-bot

ti-chi-bot Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

@JaySon-Huang: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-integration-next-gen-columnar 6798c2f link true /test pull-integration-next-gen-columnar
pull-integration-test 6798c2f link true /test pull-integration-test

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant