Skip to content

MDEV-15621 Auto add RANGE COLUMNS partitions by interval#5111

Open
mariadb-YuchenPei wants to merge 3 commits into
mainfrom
bb-main-mdev-15621
Open

MDEV-15621 Auto add RANGE COLUMNS partitions by interval#5111
mariadb-YuchenPei wants to merge 3 commits into
mainfrom
bb-main-mdev-15621

Conversation

@mariadb-YuchenPei
Copy link
Copy Markdown
Contributor

/gemini review

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@mariadb-YuchenPei mariadb-YuchenPei marked this pull request as draft May 22, 2026 05:49
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Range Interval Partitioning for RANGE COLUMNS tables, allowing for the automatic creation of partitions based on a specified temporal interval during DML operations. The implementation involves new logic for calculating and adding partitions during the table opening process, along with parser updates to support the INTERVAL syntax. Critical feedback was provided regarding a potential infinite loop if the interval is zero, a null pointer dereference risk when encountering MAXVALUE partitions, and the omission of error checking for partition count calculations. Furthermore, the reviewer suggested using session timezones for temporal calculations and identified risky bitmap manipulation that should be replaced with a safer approach.

Comment thread sql/partition_info.cc
Comment thread sql/partition_info.cc
Comment thread sql/sql_base.cc Outdated
Comment thread sql/partition_info.cc Outdated
Comment thread sql/sql_partition.cc Outdated
@mariadb-YuchenPei
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces range interval partitioning for RANGE COLUMNS tables, enabling the automatic creation of new partitions based on a specified time interval during DML operations such as INSERT, UPDATE, and LOAD DATA. The implementation includes parser support for the INTERVAL syntax, logic to calculate and add new partitions dynamically, and comprehensive test cases. Review feedback identifies a critical issue where newly added partitions are not correctly marked with the PART_TO_BE_ADDED state, which would cause them to be skipped during constant initialization. Additionally, several improvements were suggested regarding type safety—specifically avoiding unsafe pointer casts for warning variables—and ensuring the consistent use of boolean literals instead of integers for boolean types.

Comment thread sql/partition_info.cc
Comment thread sql/sql_partition.cc Outdated
Comment thread sql/sql_partition.cc Outdated
Comment thread sql/sql_partition.cc Outdated
Comment thread sql/sql_partition.cc Outdated
@mariadb-YuchenPei mariadb-YuchenPei force-pushed the bb-main-mdev-15621 branch 2 times, most recently from 0afea08 to 0ee340c Compare May 25, 2026 06:17
@mariadb-YuchenPei
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces automatic partition creation for RANGE COLUMNS partitioning using an INTERVAL clause. The implementation includes parser updates to support the new syntax, logic within the table opening process to trigger partition creation during DML operations, and internal ALTER TABLE mechanics to append new partitions. The review feedback identifies a critical bug where partitioning fields are not correctly initialized during the auto-creation process, a design limitation where partitions are only created up to the current system time, and the need to support legacy DATE and DATETIME types for broader compatibility.

Comment thread sql/partition_info.cc
Comment thread sql/partition_info.cc
Comment thread sql/partition_info.cc
Table_specification_st create_info;
Alter_info alter_info;
TABLE *table= tl->table;
/* TODO: this may still trigger MSAN unitialised? */
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This TODO should be addressed to ensure that the thread state is correctly initialized and doesn't trigger MSAN warnings during the internal partition creation process.

Comment thread sql/sql_partition.cc Outdated
@mariadb-YuchenPei mariadb-YuchenPei force-pushed the bb-main-mdev-15621 branch 9 times, most recently from 80720d0 to 60a4e27 Compare May 28, 2026 05:04
@mariadb-YuchenPei
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements automatic range interval partitioning (similar to Oracle's INTERVAL partitioning) for DATE, DATETIME, and TIMESTAMP columns. It introduces new syntax parsing for INTERVAL with functions like NUMTODSINTERVAL and NUMTOYMINTERVAL, and adds support for automatically creating new partitions during DML operations (such as INSERT, LOAD DATA, and UPDATE) when data falls outside existing partition ranges. The reviewer identified several critical and high-severity issues in the implementation. These include an early return bug in check_range_interval_constants that leaves the table's write_set corrupted, missing properties when copying part_info during auto-partitioning, memory leaks from standard new allocations instead of using mem_root, and unrestored thread state (thd->work_part_info) on function exit. Additionally, the reviewer suggested adding defensive checks for empty partition lists, guarding against infinite loops, ensuring exact string length matches for interval units, checking the return value of date_add_interval, preventing potential null pointer dereferences in the parser, and optimizing partition suffix naming to avoid O(N^2) scans.

Comment thread sql/sql_partition.cc
Comment thread sql/partition_info.cc
Comment thread sql/sql_partition.cc Outdated
Comment thread sql/partition_info.cc
Comment thread sql/partition_info.cc
Comment thread sql/partition_info.cc
Comment thread sql/partition_info.cc Outdated
Comment thread sql/sql_partition.cc Outdated
Comment thread sql/sql_yacc.yy Outdated
Comment thread sql/partition_info.h
Comment on lines +571 to +572
uint32 suffix= part_type == VERSIONING_PARTITION ? num_parts - 1 : 0;
DBUG_ASSERT(suffix > 0 || part_type != VERSIONING_PARTITION);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Efficiency optimization: suffix is initialized to 0 for range interval partitions, which causes an $O(N^2)$ scan of all existing partitions when adding multiple partitions. Initializing it to table->part_info->partitions.elements is much more efficient.

  uint32 suffix= part_type == VERSIONING_PARTITION ? num_parts - 1 : table->part_info->partitions.elements;
  DBUG_ASSERT(suffix > 0 || part_type != VERSIONING_PARTITION);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initialise to table->part_info->partitions.elements does not help because we don't know what are the used suffixes yet. Added a TODO to consider using a hash

@mariadb-YuchenPei mariadb-YuchenPei force-pushed the bb-main-mdev-15621 branch 3 times, most recently from 7e01b8f to fb91d51 Compare May 28, 2026 06:23
@mariadb-YuchenPei
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements automatic range interval partitioning for DATE, DATETIME, and TIMESTAMP columns, allowing partitions to be dynamically created during DML operations. It also adds support for Oracle-compatible interval functions like NUMTODSINTERVAL and NUMTOYMINTERVAL. The review feedback highlights critical safety improvements, including preventing potential null pointer dereferences when resolving partition expressions or field arrays, avoiding unsigned underflow in partition count calculations, and refactoring an empty conditional statement to improve code readability and maintainability.

Comment thread sql/partition_info.cc
Comment thread sql/sql_partition.cc Outdated
Comment thread sql/partition_info.cc Outdated
Comment thread sql/sql_partition.cc Outdated
@mariadb-YuchenPei
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements range interval partitioning for RANGE COLUMNS (supporting DATE, DATETIME, and TIMESTAMP columns), enabling automatic partition creation during DML operations when inserted values exceed existing ranges. It also adds Oracle-style interval syntax support. The review feedback identifies several critical issues, including a missing interval copy during partition auto-creation that would result in incorrect ranges, missing error reporting on date overflow failures, and unsafe usage of atoi on non-null-terminated LEX_CSTRING tokens in the parser. Additionally, minor improvements are suggested to optimize redundant checks and correct misleading comments.

Comment thread sql/partition_info.cc
Comment thread sql/partition_info.cc
Comment on lines +915 to +916
if (date_add_interval(thd, &cur_time, int_type, interval))
return true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If date_add_interval fails (e.g., due to date overflow), the function returns true but does not set any error in the diagnostics area. This can lead to a silent failure or an assertion failure in open_table because an error is expected when returning failure. We should explicitly report ER_DATA_OUT_OF_RANGE when date_add_interval fails.

    if (date_add_interval(thd, &cur_time, int_type, interval))
    {
      my_error(ER_DATA_OUT_OF_RANGE, MYF(0),
               part_field_array[0]->type_handler()->name().ptr(),
               "INTERVAL");
      return true;
    }

Comment thread sql/sql_base.cc
Comment thread sql/sql_partition.cc
Comment thread sql/sql_yacc.yy Outdated
Comment thread sql/sql_yacc.yy Outdated
@mariadb-YuchenPei mariadb-YuchenPei force-pushed the bb-main-mdev-15621 branch 2 times, most recently from b4486bd to 5c53970 Compare May 29, 2026 01:44
change p_column_list_val::fixed to a bool
remove redundant end label in partition_info::fix_column_value_functions
@mariadb-YuchenPei mariadb-YuchenPei force-pushed the bb-main-mdev-15621 branch 2 times, most recently from 32f4609 to 02a7d65 Compare May 29, 2026 04:50
@mariadb-YuchenPei mariadb-YuchenPei changed the title WIP Bb main mdev 15621 MDEV-15621 Auto add RANGE COLUMNS partitions by interval May 29, 2026
@mariadb-YuchenPei mariadb-YuchenPei marked this pull request as ready for review May 29, 2026 06:37
Allow auto partitioning by interval in PARTITION BY RANGE COLUMNS

PARTITION BY RANGE COLUMNS (col_name)
INTERVAL interval [AUTO]
(
  PARTITION partition_name VALUES LESS THAN (value)
  [, PARTITION partition_name VALUES LESS THAN (value) ... ]
)

where

- col_name is the name of one column of type DATE or DATETIME or
  TIMESTAMP

- at least one partition is supplied, and the highest partition cannot
  have MAXVALUE range

- INTERVAL interval is a positive time interval. it can be mariadb
  format or oracle NUMTODSINTERVAL/NUMTOYMINTERVAL format. Like
  versioning, the smallest unit is second, i.e. no subsecond like
  microsecond.

- DATE column cannot have interval with values less than a day

When performing DML on such a table, it will first add partitions
by the specified interval until the partition covers the current time.

Partition addition will not cause an implicit commit like DDL normally
does.

The partitions are named pN.

Otherwise the table behaves exactly the same as a normal RANGE COLUMNS
partitioned table.

Note that TIMESTAMP is not allowed as a type for PARTITION BY RANGE
COLUMNS otherwise.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants