Skip to content

MDEV-37294 segv in flst::remove_complete(buf_block_t*, unsigned short, unsigned char*, mtr_t*)#4905

Open
Thirunarayanan wants to merge 1 commit into11.4from
MDEV-37294
Open

MDEV-37294 segv in flst::remove_complete(buf_block_t*, unsigned short, unsigned char*, mtr_t*)#4905
Thirunarayanan wants to merge 1 commit into11.4from
MDEV-37294

Conversation

@Thirunarayanan
Copy link
Copy Markdown
Member

Problem:

During system tablespace defragmentation, extent movement occurs in two phases: prepare and complete.

  1. prepare phase validates involved pages and acquires necessary resources.
  2. complete phase performs the actual data copy.

Prepare phase fails to check whether allocating a page will make the extent FULL. When an extent has exactly (extent_size - 1) pages used, the prepare phase returns early without latching the prev/next extent descriptors needed for list manipulation.

Complete phase then allocates the final page, making the extent full, and attempts to move it from
FSEG_NOT_FULL/FSP_FREE_FRAG to FSEG_FULL/FSP_FULL_FRAG list. This fails with an assertion because the required blocks were never latched, causing a crash in flst::remove_complete().

Solution:

alloc_from_fseg_prepare(), alloc_from_free_frag_prepare(): Check if the extent will become full after allocation. This makes the prepare phase to acquire the necessary pages for FSP list manipulation

find_new_extents(): Print more revised information about moving of extent data and destination extent also.

defragment_level(): Move get_child_pages(new_block) before committing changes to enable proper rollback on failure. Well, this failure is theoretically impossible (new block is exact copy of validated old block).

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Comment on lines 4614 to 4619
if (n_used < 1 || n_used >= m_extent_size)
return DB_CORRUPTION;

if (n_used < m_extent_size)
/* check whether the extent will be FULL after allocation */
if (n_used + 1 < m_extent_size)
return DB_SUCCESS;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return DB_CORRUPTION is avoided if n_used < m_extent_size. Therefore, before this fix, it looks like the code after the return DB_SUCCESS must have been unreachable; the branch must always have been taken.

Can we put the most distinctive condition first? Something like the following:

if (n_used + 1 != m_extent_size)
  return n_used < 1 || n_used >= m_extent_size ? DB_CORRUPTION : DB_SUCCESS;

…, unsigned char*, mtr_t*)

Problem:
=======
During system tablespace defragmentation, extent movement occurs
in two phases: prepare and complete.
1) prepare phase validates involved pages and acquires necessary
resources.
2) complete phase performs the actual data copy.

Prepare phase fails to check whether allocating a page will
make the extent FULL. When an extent has exactly (extent_size - 1)
pages used, the prepare phase returns early without latching
the prev/next extent descriptors needed for list manipulation.

Complete phase then allocates the final page, making the
extent full, and attempts to move it from
FSEG_NOT_FULL/FSP_FREE_FRAG to FSEG_FULL/FSP_FULL_FRAG list.
This fails with an assertion because the required blocks were
never latched, causing a crash in flst::remove_complete().

Solution:
========
alloc_from_fseg_prepare(), alloc_from_free_frag_prepare():
call these function only if the extent will be full after
allocation. This makes the
prepare phase to acquire the necessary pages for FSP list manipulation

find_new_extents(): Print more revised information about moving
of extent data and destination extent also.

defragment_level(): Move get_child_pages(new_block) before committing
changes to enable proper rollback on failure. Well, this failure
is theoretically impossible (new block is exact copy of
validated old block).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants