Skip to content

Conversation

@wangchdo
Copy link
Contributor

@wangchdo wangchdo commented Dec 20, 2025

Summary

This PR introduces high-resolution timer (hrtimer) support as a fully independent and optional module for support of the scheduler, without affecting existing scheduler behavior.

Hrtimer is strictly isolated from the current scheduling logic:

  1. All hrtimer code is enabled only when CONFIG_HRTIMER is set.
  2. If CONFIG_HRTIMER is disabled, the scheduler continues to use the existing tick-based or tickless mechanisms with no changes.
  3. When enabled, hrtimer reuses the existing scheduler logic to drive OS ticks via a dedicated hrtimer instance.

The module does not modify any scheduler data structures or timing paths.
Hrtimer acts solely as an alternative time source. Core scheduler functions (nxsched_process_tick(), nxsched_tick_expiration(), etc.) remain unchanged and are reused as-is.

Additional safeguards:

  1. The scheduler hrtimer is initialized lazily and does not impact system startup.
  2. Existing timer infrastructure is not replaced or bypassed unless explicitly configured.
  3. Any failure or imperfection in hrtimer is confined to the module itself and cannot affect legacy scheduling paths.

Integration benefit

This design enables incremental development and review of hrtimer while ensuring that existing NuttX scheduling behavior remains stable even if the hrtimer feature is explicitly enabled.

Development benefit
With this design, developers interested in optimizing the scheduler and those focused on optimizing hrtimer can work independently on their respective improvements.

One other key update
This PR also includes an improvement(also in a seperate PR #17570) to hrtimer by refining its state machine, this is to fix some issues in SMP mode found by @Fix-Point. The refined state-machine is as shown below, and the corresponding diagram is also added in the hrtimer documentation.

hrtimer_state_machine

Impact

Add hrtimer support to nuttx scheduelr, without altering the existing scheduler behavior.

Testing

Test 1 passed (integrated in ostest):

- test implementation:

/****************************************************************************
 * apps/testing/ostest/hrtimer.c
 *
 * SPDX-License-Identifier: Apache-2.0
 *
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.  The
 * ASF licenses this file to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance with the
 * License.  You may obtain a copy of the License at
 *
 *   http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
 * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the
 * License for the specific language governing permissions and limitations
 * under the License.
 *
 ****************************************************************************/

/****************************************************************************
 * Included Files
 ****************************************************************************/

#include <nuttx/config.h>
#include <nuttx/hrtimer.h>

#include <stdio.h>
#include <sched.h>

#include "ostest.h"

/****************************************************************************
 * Pre-processor Definitions
 ****************************************************************************/

#define NSEC_PER_500MS (500 * NSEC_PER_MSEC)

/* Set a 1ms margin to allow hrtimertest to pass in QEMU.
 *
 * QEMU is a virtual platform, and its timer resolution and scheduling
 * latency may be less precise than on real hardware. Using a larger
 * margin ensures that tests do not fail due to timing inaccuracies.
 *
 * On real hardware (verified on the a2g-tc397-5v-tft board), this
 * margin can be reduced to less than 5 ns because timers are precise
 * and deterministic.
 */

#define NSEC_MARGIN    (NSEC_PER_MSEC)

/* Simple assertion macro for HRTimer test cases */
#define HRTIMER_TEST(expr, value)                                   \
  do                                                                \
    {                                                               \
      ret = (expr);                                                 \
      if (ret != (value))                                           \
        {                                                           \
          printf("ERROR: HRTimer test failed, line=%d ret=%d\n",   \
                 __LINE__, ret);                                    \
          ASSERT(false);                                            \
        }                                                           \
    }                                                               \
  while (0)

/****************************************************************************
 * Private Types
 ****************************************************************************/

/* Structure for HRTimer test tracking */

struct hrtimer_test_s
{
  hrtimer_t timer;    /* HRTimer instance */
  uint64_t  previous; /* Previous timestamp in nanoseconds */
  uint32_t  count;    /* Number of timer expirations */
  uint32_t  period;   /* Expected period between expirations */
  bool      active;   /* True while the test is still running */
};

/****************************************************************************
 * Private Functions
 ****************************************************************************/

/****************************************************************************
 * Name: hrtimer_test_init
 *
 * Description:
 *   Initialize a hrtimer_test_s structure for a new test.
 *
 * Input Parameters:
 *   test_hrtimer - Pointer to the test structure to initialize
 *   period       - Expected timer period in nanoseconds
 *
 * Returned Value:
 *   None
 *
 ****************************************************************************/

static void hrtimer_test_init(FAR struct hrtimer_test_s *test_hrtimer,
                              uint32_t period)
{
  test_hrtimer->previous = 0;
  test_hrtimer->count    = 0;
  test_hrtimer->active   = true;
  test_hrtimer->period   = period;
}

/****************************************************************************
 * Name: test_hrtimer_callback
 *
 * Description:
 *   HRTimer callback function for test.
 *
 *   - Verifies the timer interval is exactly 500ms (nanosecond precision)
 *   - Stops the test after 15 expirations
 *   - Re-arms the timer in absolute mode
 *
 * Input Parameters:
 *   hrtimer - Pointer to the expired HRTimer instance
 *
 * Returned Value:
 *   Timer period in nanoseconds (NSEC_PER_500MS)
 *
 ****************************************************************************/

static uint32_t test_hrtimer_callback(FAR hrtimer_t *hrtimer)
{
  struct timespec ts;
  uint32_t diff;
  uint64_t now;
  int ret;

  FAR struct hrtimer_test_s *test =
    (FAR struct hrtimer_test_s *)hrtimer;

  /* Increment expiration count */

  test->count++;

  /* Get current system time */

  clock_systime_timespec(&ts);
  now = clock_time2nsec(&ts);

  /* Skip comparison for first two invocations */

  if (test->count > 2)
    {
      /* Verify the timer interval is exactly
       * 500ms with nsec resolution
       */

      diff = (uint32_t)(now - test->previous);

      HRTIMER_TEST(NSEC_PER_500MS < diff + NSEC_MARGIN, true);
      HRTIMER_TEST(NSEC_PER_500MS > diff - NSEC_MARGIN, true);
    }

  test->previous = now;

  /* Stop the test after 15 expirations */

  if (test->count  >= 15)
    {
      ret = hrtimer_cancel(hrtimer);
      HRTIMER_TEST(ret, 0);

      test->active = false;
    }

  return test->period;
}

/****************************************************************************
 * Public Functions
 ****************************************************************************/

/****************************************************************************
 * Name: hrtimer_test
 *
 * Description:
 *   Entry point for high-resolution timer functional test.
 *
 *   - Initializes a HRTimer
 *   - Starts it with a 500ms relative timeout
 *   - Verifies subsequent expirations occur at 500ms intervals
 *
 * Input Parameters:
 *   None
 *
 * Returned Value:
 *   None
 *
 ****************************************************************************/

void hrtimer_test(void)
{
  int ret;
  struct hrtimer_test_s test_hrtimer_500ms;

  /* Initialize test structure */

  hrtimer_test_init(&test_hrtimer_500ms, NSEC_PER_500MS);

  /* Initialize the high-resolution timer */

  hrtimer_init(&test_hrtimer_500ms.timer,
               test_hrtimer_callback,
               NULL);

  /* Start the timer with 500ms relative timeout */

  ret = hrtimer_start(&test_hrtimer_500ms.timer,
                      test_hrtimer_500ms.period,
                      HRTIMER_MODE_REL);

  HRTIMER_TEST(ret, OK);

  /* Wait until the test completes */

  while (test_hrtimer_500ms.active)
    {
      usleep(500 * USEC_PER_MSEC);
    }
}

test log on rv-virt:smp64:

NuttShell (NSH)
nsh> 
nsh> uname -a
NuttX 0.0.0 6847a0cc95-dirty Dec 20 2025 12:26:39 risc-v rv-virt
nsh> ostest

(...)

user_main: hrtimer test

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena     1fbdec0  1fbdec0
ordblks         9        9
mxordblk  1f73880  1f73880
uordblks    109b0     ff18
fordblks  1fad510  1fadfa8

Final memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena     1fbdec0  1fbdec0
ordblks         1        9
mxordblk  1fb2cf8  1f73880
uordblks     b1c8     ff18
fordblks  1fb2cf8  1fadfa8
user_main: Exiting
ostest_main: Exiting with status 0

test 2 passed (provided by @Fix-Point )

test implementation

/****************************************************************************
 * apps/examples/hello/hello_main.c
 *
 * SPDX-License-Identifier: Apache-2.0
 *
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.  The
 * ASF licenses this file to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance with the
 * License.  You may obtain a copy of the License at
 *
 *   http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
 * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the
 * License for the specific language governing permissions and limitations
 * under the License.
 *
 ****************************************************************************/

/****************************************************************************
 * Included Files
 ****************************************************************************/

#include <nuttx/config.h>
#include <stdio.h>
#include <unistd.h>

#include <nuttx/hrtimer.h>

#define HRTIMER_TEST_THREAD_NR (1)
#define HRTIMER_TEST_NR        (1000000)

/****************************************************************************
 * Public Functions
 ****************************************************************************/

static int volatile tcount = 0;
static volatile uint32_t next = 0;

static uint32_t test_callback(FAR struct hrtimer_s *hrtimer)
{

  tcount++;
  up_ndelay(hrtimer->expired % (10 * NSEC_PER_USEC));

  return 0;
}

static uint32_t test_callback_background(FAR struct hrtimer_s *hrtimer)
{
  up_ndelay(hrtimer->expired % NSEC_PER_USEC);
  return 0;
}


static void test1(int tid)
{
  hrtimer_t   timer;
  int         count = 0;
  irqstate_t flags;
  spinlock_t lock;

  if (tid == 0)
    {
      hrtimer_init(&timer, test_callback, NULL);
    }
  else
    {
      hrtimer_init(&timer, test_callback_background, NULL);
    }

  while (count++ < HRTIMER_TEST_NR)
    {
      int ret;
      if (tid == 0)
        {
          uint64_t delay = rand() % (10 * NSEC_PER_MSEC);

          /* Simulate the periodical hrtimer.. */

          flags = spin_lock_irqsave(&lock);

          /* Use as periodical timer */

          ret = hrtimer_cancel(&timer);
          ret = hrtimer_start(&timer, 1000, HRTIMER_MODE_REL);

          spin_unlock_irqrestore(&lock, flags);

          up_ndelay(NSEC_PER_MSEC);

          flags = spin_lock_irqsave(&lock);

          ret = hrtimer_cancel_sync(&timer);
          ret = hrtimer_start(&timer, 1000, HRTIMER_MODE_REL);
          spin_unlock_irqrestore(&lock, flags);

          up_ndelay(NSEC_PER_MSEC);

          hrtimer_cancel_sync(&timer); // stucked here????
          printf("???\n");
        }
      else
        {
          /* Simulate the background hrtimer.. */

          uint64_t delay = rand() % (10 * NSEC_PER_MSEC);

          ret = hrtimer_cancel(&timer);
          ret = hrtimer_start(&timer, delay, HRTIMER_MODE_REL);
        }

      UNUSED(ret);
    }
}

static void* test_thread(void *arg)
{
  while (1)
    {
      test1((int)arg);
    }
  return NULL;
}
/****************************************************************************
 * hello_main
 ****************************************************************************/

int main(int argc, FAR char *argv[])
{
  unsigned int   thread_id;
  pthread_attr_t attr;
  pthread_t      pthreads[HRTIMER_TEST_THREAD_NR];

  printf("hrtimer_test start...\n");

  ASSERT(pthread_attr_init(&attr) == 0);

  /* Create wdog test thread */

  for (thread_id = 0; thread_id < HRTIMER_TEST_THREAD_NR; thread_id++)
    {
      ASSERT(pthread_create(&pthreads[thread_id], &attr,
                            test_thread, (void *)thread_id) == 0);
    }

  for (thread_id = 0; thread_id < HRTIMER_TEST_THREAD_NR; thread_id++)
    {
      pthread_join(pthreads[thread_id], NULL);
    }

  ASSERT(pthread_attr_destroy(&attr) == 0);

  printf("hrtimer_test end...\n");
  return 0;
}

test passed log on rv-virt:smp64

nsh> uname -a
NuttX 0.0.0 6847a0cc95-dirty Dec 20 2025 12:26:39 risc-v rv-virt
nsh> 
nsh> hello
???
???
???
(...)

test 3 passed (provided by @Fix-Point )

test implementation

/****************************************************************************
 * apps/examples/hello/hello_main.c
 *
 * SPDX-License-Identifier: Apache-2.0
 *
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.  The
 * ASF licenses this file to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance with the
 * License.  You may obtain a copy of the License at
 *
 *   http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
 * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the
 * License for the specific language governing permissions and limitations
 * under the License.
 *
 ****************************************************************************/

/****************************************************************************
 * Included Files
 ****************************************************************************/

#include <nuttx/config.h>
#include <stdio.h>

#include <nuttx/hrtimer.h>

#define HRTIMER_TEST_THREAD_NR (CONFIG_SMP_NCPUS * 8)

/****************************************************************************
 * Public Functions
 ****************************************************************************/

static uint32_t test_callback(FAR struct hrtimer_s *hrtimer)
{
  printf("test callback ...\n");
  return 0;
}

static void* test_thread(void *arg)
{
  irqstate_t  flags;
  hrtimer_t   timer;
  spinlock_t  lock = SP_UNLOCKED;
  hrtimer_init(&timer, test_callback, NULL);
  while (1)
    {
      uint64_t delay = rand() % NSEC_PER_MSEC;
      int ret;

      /* Simulate the usage of driver->wait_dog. */

      flags = spin_lock_irqsave(&lock);

      /* The driver lock acquired */

      /* First try, failed. Because hrtimer_start can not ensure the timer being started. */

      ret = hrtimer_cancel(&timer);
      // ret = hrtimer_start(&timer, 10 * NSEC_PER_USEC, HRTIMER_MODE_REL); /* May fail */

      /* This try-loop start should be OK. But it failed again.
       * Besides, we can not sleep or spin in the critical sections.
       */

      while (hrtimer_start(&timer, 10 * NSEC_PER_USEC, HRTIMER_MODE_REL) != OK);
      ret = OK;

      /* Second try, Success, but we can not sleep or spin in the critical section. */

      // ret = hrtimer_cancel_sync(&timer); /* Sleep in critical sections */
      // ret = hrtimer_start(&timer, delay, HRTIMER_MODE_REL);


      spin_unlock_irqrestore(&lock, flags);

      if (ret != OK)
        {
          printf("hrtimer_start failed\n");
        }
      up_ndelay(delay);
    }
  return NULL;
}

/****************************************************************************
 * hello_main
 ****************************************************************************/

int main(int argc, FAR char *argv[])
{
  unsigned int   thread_id;
  pthread_attr_t attr;
  pthread_t      pthreads[HRTIMER_TEST_THREAD_NR];

  printf("hrtimer_test start...\n");

  ASSERT(pthread_attr_init(&attr) == 0);

  /* Create wdog test thread */

  for (thread_id = 0; thread_id < HRTIMER_TEST_THREAD_NR; thread_id++)
    {
      ASSERT(pthread_create(&pthreads[thread_id], &attr,
                            test_thread, NULL) == 0);
    }

  for (thread_id = 0; thread_id < HRTIMER_TEST_THREAD_NR; thread_id++)
    {
      pthread_join(pthreads[thread_id], NULL);
    }

  ASSERT(pthread_attr_destroy(&attr) == 0);

  printf("hrtimer_test end...\n");
  return 0;
}

test passed log on rv-virt:smp64

NuttShell (NSH)
nsh> 
nsh> 
nsh> uname -a
NuttX 0.0.0 6847a0cc95-dirty Dec 20 2025 12:33:59 risc-v rv-virt
nsh> hello
test callback ...
test callback ...
test callback ...
test callback ...
test callback ...
test callback ...
test callback ...
test callback ...
test callback ...
test callback ...
test callback ...

(....)

@github-actions github-actions bot added Area: Documentation Improvements or additions to documentation Area: Drivers Drivers issues Area: OS Components OS Components issues Size: M The size of the change in this PR is medium labels Dec 20, 2025
@wangchdo wangchdo force-pushed the add_hrtimer_sched_support branch from 6847a0c to c073a95 Compare December 20, 2025 04:54
@wangchdo wangchdo changed the title sched: add hrtimer support to os scheduler sched/hrtimer: Part2: add hrtimer support to os scheduler Dec 20, 2025
@wangchdo wangchdo changed the title sched/hrtimer: Part2: add hrtimer support to os scheduler sched/hrtimer: Part2: introduce hrtimer support to os scheduler Dec 20, 2025
@wangchdo wangchdo force-pushed the add_hrtimer_sched_support branch from c073a95 to f1e9f9a Compare December 20, 2025 05:52
@wangchdo wangchdo force-pushed the add_hrtimer_sched_support branch from f1e9f9a to e0f1573 Compare December 20, 2025 05:55
@wangchdo wangchdo force-pushed the add_hrtimer_sched_support branch from e0f1573 to 424634c Compare December 20, 2025 06:04
@wangchdo wangchdo force-pushed the add_hrtimer_sched_support branch from 424634c to ba4a61f Compare December 20, 2025 06:11
@wangchdo wangchdo force-pushed the add_hrtimer_sched_support branch from ba4a61f to 9bcc6ae Compare December 20, 2025 06:53
@wangchdo wangchdo changed the title sched/hrtimer: Part2: introduce hrtimer support to os scheduler sched/hrtimer: Part2: introduce hrtimer support for os scheduler Dec 20, 2025
@wangchdo wangchdo changed the title sched/hrtimer: Part2: introduce hrtimer support for os scheduler sched/hrtimer: Part2: introduce hrtimer support for scheduler Dec 20, 2025
@wangchdo wangchdo force-pushed the add_hrtimer_sched_support branch from 9bcc6ae to 2c0c6b0 Compare December 20, 2025 11:21
@wangchdo
Copy link
Contributor Author

wangchdo commented Jan 2, 2026

@Fix-Point @xiaoxiang781216 @GUIDINGLI

As I pointed out in the comments on your PR (#17675), I don’t think the new hrtimer implementation is a better choice.

From my perspective, the way it addresses concurrency issues is not efficient, and it requires updating the entire hrtimer API surface, which I believe makes it less user-friendly compared to my approach.

My API design is as follows:

  1. hrtimer_init() — initializes the hrtimer object and assigns the callback and argument.
  2. hrtimer_start() — sets the expiration time for an hrtimer.
  3. hrtimer_set() — allows the user to update the callback and argument of an hrtimer.
  4. hrtimer_cancel() — cancels an hrtimer asynchronously.
  5. hrtimer_cancel_sync() — cancels an hrtimer synchronously.

My implementation focuses on resolving the concurrency concerns without forcing API-wide changes, keeping the usage model simple while still ensuring correctness under SMP.


/* Re-arm periodic timer if not canceled or re-armed concurrently */

if (period > 0 && hrtimer->expired == expired)
Copy link
Contributor

@Fix-Point Fix-Point Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your latest attempt is almost same the versioning idea I tried before. Sadly, I found it still violated the ownership invariant. In fact, the expired field can not be used as version, since the expired is not monotonic (the newer timer may has same or older expire than the older timer), which is a fundamental assumption regarding the correctness of epoch-based memory reclamation. I believe it is very easy for you to make a test case to trigger the ownership invariant violation.

In my early implmentation, I added another monotonic version field for the hrtimer to do correct versioning (or Epoch-based memory reclamation). However, it will increase the memory footprint of the hrtimer. That's why I eventually gave up on the idea.

Copy link
Contributor

@Fix-Point Fix-Point Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still insist what I stated before:

Designing correct concurrent algorithms requires systematic consideration. As I stated in #17570, after carefully reviewing your implementation, I could not really find a good solution that preserves version information without introducing significant performance or memory overhead.

Copy link
Contributor Author

@wangchdo wangchdo Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your latest attempt is almost same the versioning idea I tried before. Sadly, I found it still violated the ownership invariant. In fact, the expired field can not be used as version, since the expired is not monotonic (the newer timer may has same or older expire than the older timer), which is a fundamental assumption regarding the correctness of epoch-based memory reclamation. I believe it is very easy for you to make a test case to trigger the ownership invariant violation.

In my early implmentation, I added another monotonic version field for the hrtimer to do correct versioning (or Epoch-based memory reclamation). However, it will increase the memory footprint of the hrtimer. That's why I eventually gave up on the idea.

Can you tell the incorrect scenario using the APIs I provided? The new timer has the same expired value? I think the only case is that the user changes the callback concurrently using hrtimer_set, but keep the expired value the same without calling hrtimer_start.

I already thought about this, But I think this is a design choice:

  1. if you think this is a violation of ownership invariant, it can be easily fixed by adding a check whether the func is changed here.

  2. If you think it is not violation of ownership, since the new timer will eventually be executed and the period is updated, the check is not needed here as the current implementation. And if the user do not want this happen, he can call hrtimer_start to update the expired value immediately after calling hrtimer_set updating the callback

At last, If users don't change both the callback and the expired value, we should consider the timer not changed

Copy link
Contributor Author

@wangchdo wangchdo Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still insist what I stated before:

Designing correct concurrent algorithms requires systematic consideration. As I stated in #17570, after carefully reviewing your implementation, I could not really find a good solution that preserves version information without introducing significant performance or memory overhead.

In my opinion, I think your new implementation is not proper due to three reasons:

  1. Your way of protecting timer from concurrency update is not efficient

  2. You totally refactored the existing more user-friendly API design, this is not correct

  3. Your queue abstraction design should be based on the existing hrtimer design.in fact, if you do not provide your way of queue abstraction, i already planned to add mine.And, In my opinion, maybe I am wrong,Nuttx is a RTOS, not a OS as linux, but I found your queue abstraction has a Linux style.

At the last, i also want to make this concurrency systematic consideration illustrated in a more easy to understand way in my opinion:

If you want to fix concurrency issue, you only need to figure out all the data that may be wrongly used in concurrency case, and then find a way to protect them, this is what i think of as systematic thinking for concurrency issues

@cederom
Copy link
Contributor

cederom commented Jan 4, 2026

Thank you @wangchdo! This change builds fine now. @Fix-Point provides some important feedback as he seems to work on this already. I like approach of @wangchdo to keep things compatible and aligned with existing API. Maybe additional checks / protections may be implemented to avoid situations described by @Fix-Point? :-)

@Fix-Point can you please provide exact test scenario steps to verify problems you mentioned on @wangchdo solution? This should confirm if the implementation requires some additional protections? :-)

@xiaoxiang781216
Copy link
Contributor

@wangchdo let's focus #17642, the subset of this pr.

@wangchdo
Copy link
Contributor Author

wangchdo commented Jan 4, 2026

@wangchdo let's focus #17642, the subset of this pr.

OK

@wangchdo wangchdo force-pushed the add_hrtimer_sched_support branch 8 times, most recently from 63bb54a to ffbfb11 Compare January 5, 2026 12:00
@Fix-Point
Copy link
Contributor

Fix-Point commented Jan 5, 2026

Thank you @wangchdo! This change builds fine now. @Fix-Point provides some important feedback as he seems to work on this already. I like approach of @wangchdo to keep things compatible and aligned with existing API. Maybe additional checks / protections may be implemented to avoid situations described by @Fix-Point? :-)

@Fix-Point can you please provide exact test scenario steps to verify problems you mentioned on @wangchdo solution? This should confirm if the implementation requires some additional protections? :-)

I have provided @wangchdo with 4-5 test cases and believe I have been very patient in explaining why this approach will not work. However, he still refuses to accept my reasoning.

Here is the detailed explanation:

First, to ensure concurrency correctness, we must adhere to resource invariants, such as the ownership invariant. That is, only one user can update a shared object at any given time.

In the hrtimer design, there may be situations where a new hrtimer is already being reset while the callback function of the old hrtimer is still executing. For such scenarios, we must ensure in the hrtimer design that "a cancelled timer cannot modify the hrtimer object again." In other words, the execution core of the old timer callback loses ownership of the hrtimer object. Otherwise, it could lead to the old timer overwriting the new timer, which is an error. This is why I refer to his implementation as "functionally incorrect" (i.e., it violates the specification).

The key problem of its implementation is the violation of the ownership invariant, which is impossible to fix without introducing extra memory overhead. Let me elaborate on why his latest design still violates the ownership invariant:
After callback execution completes, a periodic timer requeues itself. In his latest implementation, the requeue condition is:

// executing callback...
if (hrtimer->expired != UINT64_MAX && hrtimer->expired != expired)

It violates the ownership invariant when:

  1. If the callback function for the current hrtimer->expired = t1 is executing and returns 0.

  2. While the callback is executing, the timer is cancelled and a new timer is started with hrtimer->expired = t2.

  3. When the callback finishes, it finds hrtimer->expired != t1 and mistakenly requeue the one-shot timer as a periodic timer, causing a system crash.

  4. Even reverting to the original implementation still violates the invariant because expired is not monotonically increasing and cannot be used as a version. It is possible for the old and new timers to have the same expired time, allowing a cancelled timer to modify the new timer.

// executing callback...
if (period > 0 && hrtimer->expired == expired)

It violates the ownership invariant when:

  1. If a periodic timer hrtimer->expired = t1, hrtimer->func = period_func is executing.
  2. While the callback is executing, the timer is cancleled and a new timer is started with hrtimer->expired = t1, hrtimer->func = oneshot_func.
  3. When the cancelled period_func finishes, while oneshot_func is still executing, the cancelled period_func overwrites the newly set oneshot timer, causing an error.

Even if we add a check for func as suggested by @wangchdo , problems remain:

// executing callback...
if (period > 0 && hrtimer->expired == expired && hrtimer->func == func)

It violates the ownership invariant when:

  1. If a periodic timer hrtimer->expired = t1, hrtimer->func = period_func is executing, and period_func returns the next delay p1.
  2. At time t1, while the hrtimer callback is executing, the periodic timer is cancelled, and a new periodic timer is set to trigger immediately with hrtimer->expired = t1, hrtimer->func = period_func, which returns the next delay p2.
  3. When the cancelled previous callback finishes, while the new callback is still executing, all version checks pass, and the next timer is set to hrtimer->expired = t1 + p1, overwriting the new timer (the expected next expired time should be hrtimer->expired = t1 + p2).

The correct solution is to add a monotonically increasing version field, version, to the hrtimer. When an hrtimer is cancelled, version increments by 1, ensuring that an cancelled old timer with old version cannot overwrite a newly set timer with newer version. However, this would inevitably increase the memory footprint of the hrtimer. I considered this approach early in the design phase but ultimately opted for hazard pointers to avoid the memory overhead.

BTW, I feel that @wangchdo has shown little respect for me:

  1. I and @wangchdo each implemented our hrtimer independently, with almost no communication during the process. These implementations are completely different. In mid-December, @xiaoxiang781216 suggested that I share my design with @wangchdo . During our exchange, I pointed out his errors regarding SMP, but he refused to acknowledge them and accused me of stealing his ideas. He also asked me to submit my incomplete internal hrtimer implementation to the community.
  2. Out of respect for his opinion, I submitted my design and demonstrated that it outperforms his in both functional correctness and performance. After I submitted my code implementation to the community, to my surprise, without any discussion, @anchao forcibly merged @wangchdo's functional incorrect implementation into the upstream, which made me feel very very uncomfortable.
  3. Yet, @wangchdo consistently refused to replace his implementation with mine. I cannot understand why he remains so stubborn.
  4. He rarely tests his own code; even pulling his PR and compiling it reveals compilation issues. He repeatedly pressured me to provide test cases for him but never offered any in return, which I find insulting.
  5. I have consistently tried to reason with him, presenting performance test results, interface comparisons, and memory usage analyses. However, he refuses to acknowledge them and has not provided any convincing evidence to support his stance.

I recommend that @wangchdo refer to my design and adopt hazard pointers, which are an optimal solution for memory efficiency. I have no issue with him using my design consideration (in fact, hazard pointers is memory reclamation technique proposed by others). Similar design goals inevitably lead to convergent designs, and this is entirely natural.

Finally, Could you please help me review my implementation #17675? I welcome any feedback and believe that, at the very least, my implementation is superior to his in terms of functional correctness, design completeness, documentation, performance, scalability, and code reusability. =)

Regarding the APIs, I have made every effort to align it with the original design. However, some aspects are inherently tied to the design and difficult to change. For example, my implementation encodes the state in hrtimer->func, which means restarting the timer requires passing the func. Besides, according to test results, this does not impact performance.

@wangchdo
Copy link
Contributor Author

wangchdo commented Jan 5, 2026

Thank you @wangchdo! This change builds fine now. @Fix-Point provides some important feedback as he seems to work on this already. I like approach of @wangchdo to keep things compatible and aligned with existing API. Maybe additional checks / protections may be implemented to avoid situations described by @Fix-Point? :-)

@Fix-Point can you please provide exact test scenario steps to verify problems you mentioned on @wangchdo solution? This should confirm if the implementation requires some additional protections? :-)

I have provided @wangchdo with 4-5 test cases and believe I have been very patient in explaining why this approach will not work. However, he still refuses to accept my reasoning.

Here is the detailed explanation:

First, to ensure concurrency correctness, we must adhere to resource invariants, such as the ownership invariant. That is, only one user can update a shared object at any given time.

In the hrtimer design, there may be situations where a new hrtimer is already being reset while the callback function of the old hrtimer is still executing. For such scenarios, we must ensure in the hrtimer design that "a cancelled timer cannot modify the hrtimer object again." In other words, the execution core of the old timer callback loses ownership of the hrtimer object. Otherwise, it could lead to the old timer overwriting the new timer, which is an error. This is why I refer to his implementation as "functionally incorrect" (i.e., it violates the specification).

The key problem of its implementation is the violation of the ownership invariant, which is impossible to fix without introducing extra memory overhead. Let me elaborate on why his latest design still violates the ownership invariant:

After callback execution completes, a periodic timer requeues itself. In his latest implementation, the requeue condition is:

// executing callback...

if (hrtimer->expired != UINT64_MAX && hrtimer->expired != expired)

It violates the ownership invariant when:

  1. If the callback function for the current hrtimer->expired = t1 is executing and returns 0.

  2. While the callback is executing, the timer is cancelled and a new timer is started with hrtimer->expired = t2.

  3. When the callback finishes, it finds hrtimer->expired != t1 and mistakenly requeue the one-shot timer as a periodic timer, causing a system crash.

  4. Even reverting to the original implementation still violates the invariant because expired is not monotonically increasing and cannot be used as a version. It is possible for the old and new timers to have the same expired time, allowing a cancelled timer to modify the new timer.

// executing callback...

if (period > 0 && hrtimer->expired == expired)

It violates the ownership invariant when:

  1. If a periodic timer hrtimer->expired = t1, hrtimer->func = period_func is executing.

  2. While the callback is executing, the timer is cancleled and a new timer is started with hrtimer->expired = t1, hrtimer->func = oneshot_func.

  3. When the cancelled period_func finishes, while oneshot_func is still executing, the cancelled period_func overwrites the newly set oneshot timer, causing an error.

Even if we add a check for func as suggested by @wangchdo , problems remain:

// executing callback...

if (period > 0 && hrtimer->expired == expired && hrtimer->func == func)

It violates the ownership invariant when:

  1. If a periodic timer hrtimer->expired = t1, hrtimer->func = period_func is executing, and period_func returns the next delay p1.

  2. At time t1, while the hrtimer callback is executing, the periodic timer is cancelled, and a new periodic timer is set to trigger immediately with hrtimer->expired = t1, hrtimer->func = period_func, which returns the next delay p2.

  3. When the cancelled previous callback finishes, while the new callback is still executing, all version checks pass, and the next timer is set to hrtimer->expired = t1 + p1, overwriting the new timer (the expected next expired time should be hrtimer->expired = t1 + p2).

The correct solution is to add a monotonically increasing version field, version, to the hrtimer. When an hrtimer is cancelled, version increments by 1, ensuring that an cancelled old timer with old version cannot overwrite a newly set timer with newer version. However, this would inevitably increase the memory footprint of the hrtimer. I considered this approach early in the design phase but ultimately opted for hazard pointers to avoid the memory overhead.

BTW, I feel that @wangchdo has shown little respect for me:

  1. I and @wangchdo each implemented our hrtimer independently, with almost no communication during the process. These implementations are completely different. In mid-December, @xiaoxiang781216 suggested that I share my design with @wangchdo . During our exchange, I pointed out his errors regarding SMP, but he refused to acknowledge them and accused me of stealing his ideas. He also asked me to submit my incomplete internal hrtimer implementation to the community.

  2. Out of respect for his opinion, I submitted my design and demonstrated that it outperforms his in both functional correctness and performance. After I submitted my code implementation to the community, to my surprise, without any discussion, @anchao forcibly merged @wangchdo's functional incorrect implementation into the upstream, which made me feel very very uncomfortable.

  3. Yet, @wangchdo consistently refused to replace his implementation with mine. I cannot understand why he remains so stubborn.

  4. He rarely tests his own code; even pulling his PR and compiling it reveals compilation issues. He repeatedly pressured me to provide test cases for him but never offered any in return, which I find insulting.

  5. I have consistently tried to reason with him, presenting performance test results, interface comparisons, and memory usage analyses. However, he refuses to acknowledge them and has not provided any convincing evidence to support his stance.

I recommend that @wangchdo refer to my design and adopt hazard pointers, which are an optimal solution for memory efficiency. I have no issue with him using my design consideration (in fact, hazard pointers is memory reclamation technique proposed by others). Similar design goals inevitably lead to convergent designs, and this is entirely natural.

Finally, Could you please help me review my implementation #17675? I welcome any feedback and believe that, at the very least, my implementation is superior to his in terms of functional correctness, design completeness, documentation, performance, scalability, and code reusability. =)

Regarding the APIs, I have made every effort to align it with the original design. However, some aspects are inherently tied to the design and difficult to change. For example, my implementation encodes the state in hrtimer->func, which means restarting the timer requires passing the func. Besides, according to test results, this does not impact performance.

@Fix-Point This is my last response to you:

It is just about design choice
or why we provide hrtimer module when we decide how to handle concurrency issues. What is most important before you make a decision of whether the implementation is right is that you need to consider if the behavior is what we want or what is the requirement?

Why do we provide hrtimer, what is the correct use cases? What do we expect users to get?

In my opinion, there is no wrong implementation, there is only implementation that meets requirements and others no.

At the last, I don't want to talk about anything else not related to technical stuff. And I very much don't want judging anyone personally happens in our community.

@xiaoxiang781216
Copy link
Contributor

xiaoxiang781216 commented Jan 5, 2026

  1. Even reverting to the original implementation still violates the invariant because expired is not monotonically increasing and cannot be used as a version. It is possible for the old and new timers to have the same expired time, allowing a cancelled timer to modify the new timer.
// executing callback...
if (period > 0 && hrtimer->expired == expired)

It violates the ownership invariant when:

  1. If a periodic timer hrtimer->expired = t1, hrtimer->func = period_func is executing.
  2. While the callback is executing, the timer is cancleled and a new timer is started with hrtimer->expired = t1, hrtimer->func = oneshot_func.
  3. When the cancelled period_func finishes, while oneshot_func is still executing, the cancelled period_func overwrites the newly set oneshot timer, causing an error.

but we just provide hrtimer_init to change hrtimer->func, which mean the func can't be changed after initialization.

Even if we add a check for func as suggested by @wangchdo , problems remain:

// executing callback...
if (period > 0 && hrtimer->expired == expired && hrtimer->func == func)

It violates the ownership invariant when:

  1. If a periodic timer hrtimer->expired = t1, hrtimer->func = period_func is executing, and period_func returns the next delay p1.
  2. At time t1, while the hrtimer callback is executing, the periodic timer is cancelled, and a new periodic timer is set to trigger immediately with hrtimer->expired = t1, hrtimer->func = period_func, which returns the next delay p2.
  3. When the cancelled previous callback finishes, while the new callback is still executing, all version checks pass, and the next timer is set to hrtimer->expired = t1 + p1, overwriting the new timer (the expected next expired time should be hrtimer->expired = t1 + p2).

We have two methods to set the next timeout:

  1. call hrtimer_start
  2. return non zero value from hrtimer->func

I think the right behaviour is that the last timeout win the competition regardless the new timeout come from method 1 or 2.

The correct solution is to add a monotonically increasing version field, version, to the hrtimer. When an hrtimer is cancelled, version increments by 1, ensuring that an cancelled old timer with old version cannot overwrite a newly set timer with newer version. However, this would inevitably increase the memory footprint of the hrtimer. I considered this approach early in the design phase but ultimately opted for hazard pointers to avoid the memory overhead.

why not set expired to special value(e.g. UINT64_MAX) to indicate the cancel.

@wangchdo wangchdo force-pushed the add_hrtimer_sched_support branch from ffbfb11 to b5a27f8 Compare January 5, 2026 14:51
@xiaoxiang781216
Copy link
Contributor

BTW, I feel that @wangchdo has shown little respect for me:

  1. I and @wangchdo each implemented our hrtimer independently, with almost no communication during the process. These implementations are completely different. In mid-December, @xiaoxiang781216 suggested that I share my design with @wangchdo . During our exchange, I pointed out his errors regarding SMP, but he refused to acknowledge them and accused me of stealing his ideas. He also asked me to submit my incomplete internal hrtimer implementation to the community.
  2. Out of respect for his opinion, I submitted my design and demonstrated that it outperforms his in both functional correctness and performance. After I submitted my code implementation to the community, to my surprise, without any discussion, @anchao forcibly merged @wangchdo's functional incorrect implementation into the upstream, which made me feel very very uncomfortable.
  3. Yet, @wangchdo consistently refused to replace his implementation with mine. I cannot understand why he remains so stubborn.
  4. He rarely tests his own code; even pulling his PR and compiling it reveals compilation issues. He repeatedly pressured me to provide test cases for him but never offered any in return, which I find insulting.
  5. I have consistently tried to reason with him, presenting performance test results, interface comparisons, and memory usage analyses. However, he refuses to acknowledge them and has not provided any convincing evidence to support his stance.

I recommend that @wangchdo refer to my design and adopt hazard pointers, which are an optimal solution for memory efficiency. I have no issue with him using my design consideration (in fact, hazard pointers is memory reclamation technique proposed by others). Similar design goals inevitably lead to convergent designs, and this is entirely natural.
Finally, Could you please help me review my implementation #17675? I welcome any feedback and believe that, at the very least, my implementation is superior to his in terms of functional correctness, design completeness, documentation, performance, scalability, and code reusability. =)
Regarding the APIs, I have made every effort to align it with the original design. However, some aspects are inherently tied to the design and difficult to change. For example, my implementation encodes the state in hrtimer->func, which means restarting the timer requires passing the func. Besides, according to test results, this does not impact performance.

@Fix-Point This is my last response to you:

It is just about design choice or why we provide hrtimer module when we decide how to handle concurrency issues. What is most important before you make a decision of whether the implementation is right is that you need to consider if the behavior is what we want or what is the requirement?

Why do we provide hrtimer, what is the correct use cases? What do we expect users to get?

In my opinion, there is no wrong implementation, there is only implementation that meets requirements and others no.

At the last, I don't want to talk about anything else not related to technical stuff. And I very much don't want judging anyone personally happens in our community.

@wangchdo I think all change in this pr come from the discussion with @Fix-Point , so I suggest you add @Fix-Point as coauthor to acknowledge his contribution.

@wangchdo
Copy link
Contributor Author

wangchdo commented Jan 5, 2026

BTW, I feel that @wangchdo has shown little respect for me:

  1. I and @wangchdo each implemented our hrtimer independently, with almost no communication during the process. These implementations are completely different. In mid-December, @xiaoxiang781216 suggested that I share my design with @wangchdo . During our exchange, I pointed out his errors regarding SMP, but he refused to acknowledge them and accused me of stealing his ideas. He also asked me to submit my incomplete internal hrtimer implementation to the community.
  1. Out of respect for his opinion, I submitted my design and demonstrated that it outperforms his in both functional correctness and performance. After I submitted my code implementation to the community, to my surprise, without any discussion, @anchao forcibly merged @wangchdo's functional incorrect implementation into the upstream, which made me feel very very uncomfortable.
  1. Yet, @wangchdo consistently refused to replace his implementation with mine. I cannot understand why he remains so stubborn.
  1. He rarely tests his own code; even pulling his PR and compiling it reveals compilation issues. He repeatedly pressured me to provide test cases for him but never offered any in return, which I find insulting.
  1. I have consistently tried to reason with him, presenting performance test results, interface comparisons, and memory usage analyses. However, he refuses to acknowledge them and has not provided any convincing evidence to support his stance.

I recommend that @wangchdo refer to my design and adopt hazard pointers, which are an optimal solution for memory efficiency. I have no issue with him using my design consideration (in fact, hazard pointers is memory reclamation technique proposed by others). Similar design goals inevitably lead to convergent designs, and this is entirely natural.

Finally, Could you please help me review my implementation #17675? I welcome any feedback and believe that, at the very least, my implementation is superior to his in terms of functional correctness, design completeness, documentation, performance, scalability, and code reusability. =)

Regarding the APIs, I have made every effort to align it with the original design. However, some aspects are inherently tied to the design and difficult to change. For example, my implementation encodes the state in hrtimer->func, which means restarting the timer requires passing the func. Besides, according to test results, this does not impact performance.

@Fix-Point This is my last response to you:

It is just about design choice or why we provide hrtimer module when we decide how to handle concurrency issues. What is most important before you make a decision of whether the implementation is right is that you need to consider if the behavior is what we want or what is the requirement?

Why do we provide hrtimer, what is the correct use cases? What do we expect users to get?

In my opinion, there is no wrong implementation, there is only implementation that meets requirements and others no.

At the last, I don't want to talk about anything else not related to technical stuff. And I very much don't want judging anyone personally happens in our community.

@wangchdo I think all change in this pr come from the discussion with @Fix-Point , so I suggest you add @Fix-Point as coauthor to acknowledge his contribution.

OK, I will add @Fix-Point as coauthor to acknowledge the discussion back and forth with him. However, to be honest, I couldn't say that I enjoyed this kind of discussion, I hope next time we could discuss in a more enjoyable way:)

@cederom
Copy link
Contributor

cederom commented Jan 5, 2026

Thank you guys! Really amazing and complex work! Good to hear there is a consensus and best possible solution is the outcome for NuttX :-)

I did write to a dev@ ml for other folks to take a look and review / comment / test :-)

@wangchdo wangchdo force-pushed the add_hrtimer_sched_support branch 2 times, most recently from ea8416a to a93f825 Compare January 6, 2026 03:56
wangchdo and others added 4 commits January 6, 2026 13:12
  Allow running/armed hrtimer to be restarted to
  fix hrtimer bug: apache#17567

Co-authored-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
Signed-off-by: Chengdong Wang <wangchengdong@lixiang.com>
    Update the hrtimer documentation to describe the hrtimer state machine,
    which is introduced to handle safe cancellation and execution in SMP
    environments.

Signed-off-by: Chengdong Wang <wangchengdong@lixiang.com>
Enable the timer start functions when hrtimer is enabled. This allows
hrtimer to set timer expirations with nanosecond resolution.

Signed-off-by: Chengdong Wang <wangchengdong@lixiang.com>
    This commit add hrtimer support to scheduler
    tick without altering the existing scheduler behavior.

Signed-off-by: Chengdong Wang <wangchengdong@lixiang.com>
@wangchdo wangchdo force-pushed the add_hrtimer_sched_support branch from a93f825 to bce410f Compare January 6, 2026 05:14
  When hrtimer is enabled, the tickless scheduler should call
  nxsched_hrtimer_start to start the timer, this is because
  the tick system is support by hrtimer

Signed-off-by: Chengdong Wang <wangchengdong@lixiang.com>
@wangchdo wangchdo force-pushed the add_hrtimer_sched_support branch from bce410f to 867a408 Compare January 6, 2026 05:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: Documentation Improvements or additions to documentation Area: Drivers Drivers issues Area: OS Components OS Components issues Size: L The size of the change in this PR is large Size: M The size of the change in this PR is medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants