fix: support MySQL user-level lock functions#24909
Conversation
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
XuPeng-SH
left a comment
There was a problem hiding this comment.
I found multiple blocking issues in the current implementation.
-
The named-lock state is only a package-global in-memory map inside one CN process. In MatrixOne this means two sessions routed to different CNs can both acquire the same lock name at the same time, so the core exclusivity guarantee is broken for real cross-session coordination.
-
Locks are never released when the owning session disconnects. The only mutations of the new lock map are explicit acquire/release calls, and this PR does not add any session-teardown cleanup path. So a client that exits without
RELEASE_LOCK()can leave the lock stuck forever until the CN restarts, which is not compatible with MySQL named-lock semantics. -
GET_LOCK(name, timeout)handles timeout semantics incorrectly:timeout <= 0returns0immediately, but MySQL compatibility requirestimeout = 0to mean no-wait and negative timeout to mean wait indefinitely (subject to session cancellation). -
The unhappy-path coverage is still thin around the new blocking behavior. There are no tests for disconnect cleanup, successful wait-then-acquire, timeout expiry, context cancellation while waiting, or the reentrant refcount path.
Because the first two are real behavior bugs and the third is a compatibility bug in the exposed SQL contract, I am requesting changes.
c6ab3b7 to
5ca0b17
Compare
|
Updated the PR to address the review request:
Tests run:
|
XuPeng-SH
left a comment
There was a problem hiding this comment.
I re-reviewed the latest head 9f3f04ff0baa3047317559d27e95ed83388ac777. The previous distributed-lock/session-cleanup issues are mostly addressed, but I still found blocking MySQL-compatibility issues in the exposed SQL contract.
-
RELEASE_LOCK()cannot distinguish “not owned by this session” from “lock does not exist”. InreleaseUserLevelLock, the local-refcount miss path returns0, nilimmediately (pkg/sql/plan/function/func_unary.go:6960-6963). MySQL requires different results here:0only when the named lock exists but was not established by this thread, andNULLwhen the named lock does not exist / was already released (see the official MySQL locking-functions doc: https://dev.mysql.com/doc/refman/8.4/en/locking-functions.html). I verified this locally with a temporary test expectingRELEASE_LOCK('missing_lock')to return NULL; current code returns non-NULL 0 (the 1th row expected NULL, but get not null). This will make compatibility checks and application logic unable to tell “someone else owns it” from “there is no lock”. Please make release return nullable state, e.g. probe the lockservice with FastFail when the local refcount is absent so free/nonexistent returns NULL while conflict returns 0, and add tests for never-created and already-released locks. -
Lock names are not validated against MySQL's 64-character limit. The current paths (
getUserLevelLock,releaseUserLevelLock, andisUserLevelLockFree) passnamedirectly into the row/txn keys without any length check (pkg/sql/plan/function/func_unary.go:6924,6954,6975). MySQL enforces a maximum lock-name length of 64 characters. With this PR, overlong names are accepted and can acquire/release real locks, which is a visible behavior divergence from the stated MySQL compatibility goal. Please enforce the same limit, define the intended NULL/error behavior for incorrect arguments, and cover GET_LOCK/RELEASE_LOCK/IS_FREE_LOCK overlong-name cases.
Tests run locally with the PR worktree and main-worktree thirdparty CGO flags:
go test ./pkg/sql/plan/function -run 'TestUserLevelLock|TestReleaseUserLevelLocksCleanup|TestUserLevelLockBuiltinRegistration|Test_funids' -count=1go test ./pkg/frontend -run TestNonExistent -count=1git diff --check upstream/4.0-dev...HEAD
I also noted the latest GitHub CI still has failures in Ubuntu UT and target upgrade; I could not get the failed log segment through gh run view --log-failed, so I am not using that as the primary blocker here.
1. RELEASE_LOCK now returns NULL (via isNull flag) when the named lock does not exist or was already released, matching MySQL behavior. When local refcount is 0, the code probes the lockservice with FastFail to distinguish 'lock does not exist' (NULL) from 'lock exists but held by another session' (0). 2. Enforce MySQL-compatible 64-character limit on lock names in getUserLevelLock, releaseUserLevelLock, and isUserLevelLockFree. 3. Add tests: - RELEASE_LOCK on never-created lock returns NULL - RELEASE_LOCK on already-released lock returns NULL - RELEASE_LOCK on lock held by another session returns 0 - Overlong lock names are rejected - Exactly-max-length lock names work Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
XuPeng-SH
left a comment
There was a problem hiding this comment.
Re-reviewed latest head 5b3f328768fdf241787f6ca7f21b4030bbba4b8f. The previous RELEASE_LOCK() NULL semantics issue is fixed, but there are still correctness gaps in lock-name handling that can break MySQL compatibility and mutual exclusion.
Blocking issues:
-
pkg/sql/plan/function/func_unary.go:6824:validateUserLevelLockNameuseslen(name), so the 64 limit is enforced in bytes, not characters. MySQL documents the limit as 64 characters, and the MySQL WL for this feature says lock names are converted to UTF-8 before applying the new 64-character semantics: https://dev.mysql.com/doc/refman/8.4/en/locking-functions.html and https://dev.mysql.com/worklog/task/?id=1159. A name of 64 Chinese characters currently returnsinternal error: user-level lock name exceeds maximum length of 64 characters, even though it is exactly 64 characters. MatrixOne already has UTF-8 length helpers in this package; this should validate character/code-point length, and add a multibyte boundary test. -
pkg/sql/plan/function/func_unary.go:6838/pkg/sql/plan/function/func_unary.go:6938: lock names are used as raw strings for the distributed row key and the local refcount key, making the implementation case-sensitive. MySQL's user-lock implementation performs case-insensitive comparison and uses a lowercased lock name as the MDL key (see the same WL above). Today two sessions can both acquire what MySQL treats as the same lock: session AGET_LOCK('case_lock', 0)returns 1, then session BGET_LOCK('CASE_LOCK', 0)also returns 1. That breaks the advisory-lock mutual exclusion guarantee. Please normalize the lock name consistently before validating/tracking/locking and add contention tests across case variants. -
pkg/sql/plan/function/func_unary.go:6824: empty lock names are accepted because validation only checks> 64. MySQL's WL calls NULL, empty string, and longer-than-64 names wrong-name inputs. CurrentGET_LOCK('', 0)creates a real distributed lock, which leaves MatrixOne with non-MySQL state and release semantics for an invalid name. Please reject''consistently inGET_LOCK,RELEASE_LOCK, andIS_FREE_LOCK, with tests.
Local verification:
- PR tests pass:
go test ./pkg/sql/plan/function -run 'TestUserLevelLock|TestReleaseUserLevelLocksCleanup|TestUserLevelLockBuiltinRegistration|Test_funids' -count=1 - Frontend compile check passes:
go test ./pkg/frontend -run TestNonExistent -count=1 git diff --check upstream/4.0-dev...HEADpasses- Added temporary review-only tests for the three cases above; all failed as described, then removed them before submitting this review.
- Replace validateUserLevelLockName with normalizeUserLevelLockName that lowercases names for case-insensitive matching (MySQL requirement). - Enforce 64-character limit using utf8.RuneCountInString (character count), not len() (byte count), matching MySQL's documented behavior. - Reject empty lock names in GET_LOCK, RELEASE_LOCK, and IS_FREE_LOCK. - Add tests for empty-name rejection, case-insensitive contention, and multibyte character boundary (64/65 Chinese characters). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
What type of PR is this?
Which issue(s) does this PR fix or relate to?
Fixes #24728
What this PR does / why we need it:
What
Why
Fixes #24728. Prisma Migrate calls GET_LOCK before applying migrations.
Tests