MDEV-39261 MariaDB crash on startup in presence of indexed virtual columns#4914
MDEV-39261 MariaDB crash on startup in presence of indexed virtual columns#4914Thirunarayanan wants to merge 1 commit into10.6from
Conversation
|
|
dr-m
left a comment
There was a problem hiding this comment.
Thank you, great work in reproducing the error. I found some ways to simplify this logic further. I think that we should invoke open_purge_table() for each dict_table_t::id only once per purge batch, in trx_purge_table_acquire(), provided that indexed virtual columns exist. In that way, when we are purging individual undo log records, we are guaranteed to have a TABLE* handle available.
| TABLE *get_purge_table(THD *thd); | ||
| TABLE *get_purge_table(THD *thd, const char *db, size_t db_len, | ||
| const char *tbl, size_t tblen); |
| char db_buf[NAME_LEN + 1]; | ||
| char tbl_buf[NAME_LEN + 1]; | ||
| ulint db_buf_len, tbl_buf_len; | ||
|
|
||
| if (!table->parse_name(db_buf, tbl_buf, &db_buf_len, &tbl_buf_len)) { | ||
| return NULL; | ||
| } |
There was a problem hiding this comment.
Virtual columns can be defined on partitioned tables, and table can include the name of a partition or a subpartition. However, I double-checked that this function indeed returns the MDL name of a table, which would fit in NAME_LEN.
I did not quite understand why dict_table_t::parse_name is invoking filename_to_tablename() with a larger to_length than NAME_LEN or NAME_LEN + 1. That seems to lead to a potential buffer overflow. Can you fix that?
Can we use size_t for sizes in new code?
| if (bg_thread) { | ||
| return open_purge_table(thd, db_buf, db_buf_len, | ||
| tbl_buf, tbl_buf_len); |
There was a problem hiding this comment.
Move this code to the preceding if (bg_thread) block. There is no need for it to be separate.
| @@ -20073,9 +20074,17 @@ static TABLE* innodb_find_table_for_vc(THD* thd, dict_table_t* table) | |||
| { | |||
| TABLE *mysql_table; | |||
| const bool bg_thread = THDVAR(thd, background_thread); | |||
There was a problem hiding this comment.
Do we really need the background_thread member, or could we pass a parameter that distinguishes the purge worker thread, to ib_vcol_row::record()? git grep 'vc\.record' to find each call. The purge specific call is row_vers_build_clust_v_col() when it is being invoked from row_purge_is_unsafe().
Would it be possible to open the TABLE handles in the purge coordinator thread, in trx_purge_table_acquire()? In that way, ib_vcol_row::record() in purge would always be guaranteed to have an open TABLE* handle, and this function would never be called there. But what about data races between purge worker threads that would try to access the TABLE::record buffer for evaluating the virtual column values for different base column values? That concern is mostly invalid, because table_id_map guarantees that each table_id is specific to a one purge_node_t per purge batch. However, if a table is partitioned, then we can have multiple table_id for the same TABLE_SHARE and table name. If there is a problem in this case, it can be solved by caching the TABLE* in a new data member of purge_node_t.
Consolidating the open_purge_table() in one place (trx_purge_table_acquire()) should allow us to get rid of get_purge_table() altogether.
…lumns Problem: ======== A single InnoDB purge worker thread can process undo logs from different tables within the same batch. But get_purge_table(), open_purge_table() incorrectly assumes that a 1:1 relationship between a purge worker thread and a table within a single batch. Based on this wrong assumtion, InnoDB attempts to reuse TABLE objects cached in thd->open_tables for virtual column computation. 1) Purge worker opens Table A and caches the TABLE pointer in thd->open_tables. 2) Same purge worker moves to Table B in the same batch, get_purge_table() retrieves the cached pointer for Table A instead of opening Table B. 3) Because innobase::open() is ignored for Table B, the virtual column template is never initialized. 4) virtual column computation for Table B aborts the server Solution: ======== get_purge_table(): Accept the specific db_name and table_name associated with the current undo log record. Compare the db_name and table_name against the existing cached TABLE objects. If it is match then return the cached table object.
b27e5cd to
f4af9a8
Compare
Problem:
A single InnoDB purge worker thread can process undo logs from different tables within the same batch. But get_purge_table(), open_purge_table() incorrectly assumes that a 1:1 relationship between a purge worker thread and a table within a single batch. Based on this wrong assumtion, InnoDB attempts to reuse TABLE objects cached in thd->open_tables for virtual column computation.
Solution:
get_purge_table(): Accept the specific db_name and table_name associated with the current undo log record. Compare the db_name and table_name against the existing cached TABLE objects. If it is match then return the cached table object.