-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[EXPERIMENTAL]sched/hrtimer: Part 2: refine hrtimer state machine and introduce scheduler support with hrtimer #17573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
6847a0c to
c073a95
Compare
c073a95 to
f1e9f9a
Compare
f1e9f9a to
e0f1573
Compare
e0f1573 to
424634c
Compare
424634c to
ba4a61f
Compare
ba4a61f to
9bcc6ae
Compare
9bcc6ae to
2c0c6b0
Compare
|
@Fix-Point @xiaoxiang781216 @GUIDINGLI As I pointed out in the comments on your PR (#17675), I don’t think the new hrtimer implementation is a better choice. From my perspective, the way it addresses concurrency issues is not efficient, and it requires updating the entire hrtimer API surface, which I believe makes it less user-friendly compared to my approach. My API design is as follows:
My implementation focuses on resolving the concurrency concerns without forcing API-wide changes, keeping the usage model simple while still ensuring correctness under SMP. |
sched/hrtimer/hrtimer_process.c
Outdated
|
|
||
| /* Re-arm periodic timer if not canceled or re-armed concurrently */ | ||
|
|
||
| if (period > 0 && hrtimer->expired == expired) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your latest attempt is almost same the versioning idea I tried before. Sadly, I found it still violated the ownership invariant. In fact, the expired field can not be used as version, since the expired is not monotonic (the newer timer may has same or older expire than the older timer), which is a fundamental assumption regarding the correctness of epoch-based memory reclamation. I believe it is very easy for you to make a test case to trigger the ownership invariant violation.
In my early implmentation, I added another monotonic version field for the hrtimer to do correct versioning (or Epoch-based memory reclamation). However, it will increase the memory footprint of the hrtimer. That's why I eventually gave up on the idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still insist what I stated before:
Designing correct concurrent algorithms requires systematic consideration. As I stated in #17570, after carefully reviewing your implementation, I could not really find a good solution that preserves version information without introducing significant performance or memory overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your latest attempt is almost same the versioning idea I tried before. Sadly, I found it still violated the ownership invariant. In fact, the
expiredfield can not be used as version, since theexpiredis not monotonic (the newer timer may has same or olderexpirethan the older timer), which is a fundamental assumption regarding the correctness ofepoch-based memory reclamation. I believe it is very easy for you to make a test case to trigger the ownership invariant violation.In my early implmentation, I added another monotonic
versionfield for the hrtimer to do correctversioning(orEpoch-based memory reclamation). However, it will increase the memory footprint of the hrtimer. That's why I eventually gave up on the idea.
Can you tell the incorrect scenario using the APIs I provided? The new timer has the same expired value? I think the only case is that the user changes the callback concurrently using hrtimer_set, but keep the expired value the same without calling hrtimer_start.
I already thought about this, But I think this is a design choice:
-
if you think this is a violation of ownership invariant, it can be easily fixed by adding a check whether the func is changed here.
-
If you think it is not violation of ownership, since the new timer will eventually be executed and the period is updated, the check is not needed here as the current implementation. And if the user do not want this happen, he can call hrtimer_start to update the expired value immediately after calling hrtimer_set updating the callback
At last, If users don't change both the callback and the expired value, we should consider the timer not changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still insist what I stated before:
Designing correct concurrent algorithms requires systematic consideration. As I stated in #17570, after carefully reviewing your implementation, I could not really find a good solution that preserves version information without introducing significant performance or memory overhead.
In my opinion, I think your new implementation is not proper due to three reasons:
-
Your way of protecting timer from concurrency update is not efficient
-
You totally refactored the existing more user-friendly API design, this is not correct
-
Your queue abstraction design should be based on the existing hrtimer design.in fact, if you do not provide your way of queue abstraction, i already planned to add mine.And, In my opinion, maybe I am wrong,Nuttx is a RTOS, not a OS as linux, but I found your queue abstraction has a Linux style.
At the last, i also want to make this concurrency systematic consideration illustrated in a more easy to understand way in my opinion:
If you want to fix concurrency issue, you only need to figure out all the data that may be wrongly used in concurrency case, and then find a way to protect them, this is what i think of as systematic thinking for concurrency issues
|
Thank you @wangchdo! This change builds fine now. @Fix-Point provides some important feedback as he seems to work on this already. I like approach of @wangchdo to keep things compatible and aligned with existing API. Maybe additional checks / protections may be implemented to avoid situations described by @Fix-Point? :-) @Fix-Point can you please provide exact test scenario steps to verify problems you mentioned on @wangchdo solution? This should confirm if the implementation requires some additional protections? :-) |
63bb54a to
ffbfb11
Compare
I have provided @wangchdo with 4-5 test cases and believe I have been very patient in explaining why this approach will not work. However, he still refuses to accept my reasoning. Here is the detailed explanation: First, to ensure concurrency correctness, we must adhere to resource invariants, such as the ownership invariant. That is, only one user can update a shared object at any given time. In the hrtimer design, there may be situations where a new hrtimer is already being reset while the callback function of the old hrtimer is still executing. For such scenarios, we must ensure in the hrtimer design that "a cancelled timer cannot modify the hrtimer object again." In other words, the execution core of the old timer callback loses ownership of the hrtimer object. Otherwise, it could lead to the old timer overwriting the new timer, which is an error. This is why I refer to his implementation as "functionally incorrect" (i.e., it violates the specification). The key problem of its implementation is the violation of the ownership invariant, which is impossible to fix without introducing extra memory overhead. Let me elaborate on why his latest design still violates the ownership invariant: // executing callback...
if (hrtimer->expired != UINT64_MAX && hrtimer->expired != expired)It violates the ownership invariant when:
// executing callback...
if (period > 0 && hrtimer->expired == expired)It violates the ownership invariant when:
Even if we add a check for // executing callback...
if (period > 0 && hrtimer->expired == expired && hrtimer->func == func)It violates the ownership invariant when:
The correct solution is to add a monotonically increasing version field, BTW, I feel that @wangchdo has shown little respect for me:
I recommend that @wangchdo refer to my design and adopt hazard pointers, which are an optimal solution for memory efficiency. I have no issue with him using my design consideration (in fact, hazard pointers is memory reclamation technique proposed by others). Similar design goals inevitably lead to convergent designs, and this is entirely natural. Finally, Could you please help me review my implementation #17675? I welcome any feedback and believe that, at the very least, my implementation is superior to his in terms of functional correctness, design completeness, documentation, performance, scalability, and code reusability. =) Regarding the APIs, I have made every effort to align it with the original design. However, some aspects are inherently tied to the design and difficult to change. For example, my implementation encodes the state in |
@Fix-Point This is my last response to you: It is just about design choice Why do we provide hrtimer, what is the correct use cases? What do we expect users to get? In my opinion, there is no wrong implementation, there is only implementation that meets requirements and others no. At the last, I don't want to talk about anything else not related to technical stuff. And I very much don't want judging anyone personally happens in our community. |
but we just provide hrtimer_init to change
We have two methods to set the next timeout:
I think the right behaviour is that the last timeout win the competition regardless the new timeout come from method 1 or 2.
why not set expired to special value(e.g. |
ffbfb11 to
b5a27f8
Compare
@wangchdo I think all change in this pr come from the discussion with @Fix-Point , so I suggest you add @Fix-Point as coauthor to acknowledge his contribution. |
OK, I will add @Fix-Point as coauthor to acknowledge the discussion back and forth with him. However, to be honest, I couldn't say that I enjoyed this kind of discussion, I hope next time we could discuss in a more enjoyable way:) |
|
Thank you guys! Really amazing and complex work! Good to hear there is a consensus and best possible solution is the outcome for NuttX :-) I did write to a dev@ ml for other folks to take a look and review / comment / test :-) |
ea8416a to
a93f825
Compare
Allow running/armed hrtimer to be restarted to fix hrtimer bug: apache#17567 Co-authored-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com> Signed-off-by: Chengdong Wang <wangchengdong@lixiang.com>
Update the hrtimer documentation to describe the hrtimer state machine,
which is introduced to handle safe cancellation and execution in SMP
environments.
Signed-off-by: Chengdong Wang <wangchengdong@lixiang.com>
Enable the timer start functions when hrtimer is enabled. This allows hrtimer to set timer expirations with nanosecond resolution. Signed-off-by: Chengdong Wang <wangchengdong@lixiang.com>
This commit add hrtimer support to scheduler
tick without altering the existing scheduler behavior.
Signed-off-by: Chengdong Wang <wangchengdong@lixiang.com>
a93f825 to
bce410f
Compare
When hrtimer is enabled, the tickless scheduler should call nxsched_hrtimer_start to start the timer, this is because the tick system is support by hrtimer Signed-off-by: Chengdong Wang <wangchengdong@lixiang.com>
bce410f to
867a408
Compare
Summary
This PR introduces high-resolution timer (hrtimer) support as a fully independent and optional module for support of the scheduler, without affecting existing scheduler behavior.
Hrtimer is strictly isolated from the current scheduling logic:
The module does not modify any scheduler data structures or timing paths.
Hrtimer acts solely as an alternative time source. Core scheduler functions (nxsched_process_tick(), nxsched_tick_expiration(), etc.) remain unchanged and are reused as-is.
Additional safeguards:
Integration benefit
This design enables incremental development and review of hrtimer while ensuring that existing NuttX scheduling behavior remains stable even if the hrtimer feature is explicitly enabled.
Development benefit
With this design, developers interested in optimizing the scheduler and those focused on optimizing hrtimer can work independently on their respective improvements.
One other key update
This PR also includes an improvement(also in a seperate PR #17570) to hrtimer by refining its state machine, this is to fix some issues in SMP mode found by @Fix-Point. The refined state-machine is as shown below, and the corresponding diagram is also added in the hrtimer documentation.
Impact
Add hrtimer support to nuttx scheduelr, without altering the existing scheduler behavior.
Testing
Test 1 passed (integrated in ostest):
- test implementation:
test log on rv-virt:smp64:
test 2 passed (provided by @Fix-Point )
test implementation
test passed log on rv-virt:smp64
test 3 passed (provided by @Fix-Point )
test implementation
test passed log on rv-virt:smp64