Skip to content

Conversation

@sumitagrawl
Copy link
Contributor

What changes were proposed in this pull request?

Exit of safemode status for follower when all rules are satisfied. This is as:

  • For leader applyTransaction completion is implicit before being choosen as leader.
  • For follower, exit safe mode rule as transaction sync and apply transaction keeps happening from leader node.

This change in reference to below point in JIRA HDDS-5263:
But once the SCM Ratis server started it will replay logs from Transactioninfo last applied Index, so after that I see all pipelines are removed. (might be due to close pipeline)

So this have no issue for leader node once choosen during leader election. And for follower node, no need follow same restriction as dynamically the pipeline can be closed on runtime and same can be created by leader continuously.
This check is present during startup to ensure SCM is writable and allow time to create pipeline as best effort and its not true always for the case of pipeline closure later on due to system unhealthy or other reason (System never moves back to Safemode later on as current behavior for any dynamic change later on).

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14070

How was this patch tested?

  • impact of existing test case

@sumitagrawl sumitagrawl requested a review from szetszwo January 26, 2026 20:02
@sumitagrawl sumitagrawl marked this pull request as ready for review January 27, 2026 17:40
Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sumitagrawl , thanks for working on this! Please see the comments inlined.

Comment on lines +288 to +293
if (!refreshedAfterLeaderReady.get()) {
// refresh and validate safe mode rules if it can exit safe mode
// if being leader, all previous term transactions have been applied
// if other states, just refresh safe mode rules, and transaction keeps flushing from leader
// and does not depend on pending transactions.
refreshedAfterLeaderReady.set(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not atomic. Use

    if (refreshedAfterLeaderReady.compareAndSet(false, true)) {

Comment on lines 366 to 368
if (currentLeaderTerm.get() == term) {
// Means all transactions before this term have been applied.
// This means after a restart, all pending transactions have been applied.
// Perform
// 1. Refresh Safemode rules state.
// 2. Start DN Rpc server.
if (!refreshedAfterLeaderReady.get()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, it needs compareAndSet.

BTW, why refreshedAfterLeaderReady won't be set to false after set to true?

Comment on lines +185 to +193
StorageContainerManager mockScmManager = mock(StorageContainerManager.class);
SCMHAManager mockScmhaManager = mock(SCMHAManager.class);
when(mockScmManager.getScmHAManager()).thenReturn(mockScmhaManager);
SCMRatisServer mockScmRatisServer = mock(SCMRatisServer.class);
when(mockScmhaManager.getRatisServer()).thenReturn(mockScmRatisServer);
SCMStateMachine mockScmStateMachine = mock(SCMStateMachine.class);
when(mockScmRatisServer.getSCMStateMachine()).thenReturn(mockScmStateMachine);
when((mockScmStateMachine.isRefreshedAfterLeaderReady())).thenReturn(true);
scmContext = new SCMContext.Builder().setSCM(mockScmManager).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use mock. We need to a real cluster test which test SCM changing leader multiple times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants