Skip to content
/ server Public

MDEV-34228: Support underscores in numeric literals (SQL:2023 T662)#4624

Open
abhishek593 wants to merge 4 commits intoMariaDB:mainfrom
abhishek593:MDEV-34228
Open

MDEV-34228: Support underscores in numeric literals (SQL:2023 T662)#4624
abhishek593 wants to merge 4 commits intoMariaDB:mainfrom
abhishek593:MDEV-34228

Conversation

@abhishek593
Copy link

The main contributions of this PR are:

  • Added function get_numeric_token(), which is a wrapper around get_token() and it checks if there's an underscore in the current token.
  • Added helper function strip_underscores(), which just erases underscores from the current token.
  • Updated sql_lex.cc states (MY_LEX_NUMBER_IDENT, MY_LEX_INT_OR_REAL, MY_LEX_REAL) to permit single underscores between digits in decimal, hexadecimal, and binary literals.
  • New Test Suite: Added mysql-test/main/numeric_underscores.test covering all possible cases.

New validation rules:

  • Only a single underscore is allowed between digits.
  • Consecutive underscores are prohibited (e.g., 1__000 is an error).
  • An underscore cannot appear at the end of a numeric literal.
  • An underscore cannot appear adjacent to a decimal point . or exponent marker e/E (e.g., 1_.0 and 1_e10 are invalid).
  • Underscore is allowed immediately after the base prefix (e.g., 0x_FF, 0b_10).

@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Feb 9, 2026
@gkodinov gkodinov self-assigned this Feb 13, 2026
Copy link
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good start! Thank you for your contribution!
This is a preliminary review. The important part is to measure and document the performance impact.

Please also quote into the worklog not some random blog, but the actual version, clause and ideally the definition of the extension itself.

I'd be especially interested how does the standard define the conversions of such literal to strings. E.g. is cast (1_2_3 AS string) supposed to preserve these?
And vice versa, is cast("123" as INT) supposed to preserve these?

You also need to cover casts to/from strings, e.g.: CAST('1_2.3_4e1_0' AS …)

LEX_CSTRING Lex_input_stream::get_numeric_token(uint skip, uint length)
{
const char *str= m_tok_start + skip;
for (uint i= 0; i < length; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use strchr: maybe the compiler will optimize it better.

underscore will be removed. We can iterate over str to count
exact size of the new string, but that may be slower.
*/
if (!(to= m_thd->alloc<char>(length)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like you to profile and record the following performance comparisons:

  1. processing a SQL SELECT statement with 1M integer literals (no underscores): prior to your fix compared to after. Run that 10M times to get a good sample.
  2. same as above, but with underscores.


SELECT 0xABCD_EF01;
0xABCD_EF01
���
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably garbled by some editor!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

2 participants