Skip to content

Conversation

@f3sch
Copy link
Collaborator

@f3sch f3sch commented Aug 5, 2025

Instead of using the full atan2 which is rather expensive one can use a faster approximate method.
The speedup on cpu is around 10% for the trackleting phases (on cpu this is 5% of the total time with 20 threads, so a gain of around 0.5%)
On gpu the speedup is much less 0.5% (prob. within fluctuations, on gpu this phase is around 26% of the total time with <<<30,256>>>, so only 0.13% faster).

The difference in output is minimal (below per-mille). Below is the comparison. The plot in the bottom right shows that after switching with deterministic mode the gpu part still 1:1 reproduces the cpu output.

ptRatioCN

f3sch added 3 commits August 5, 2025 14:14
Signed-off-by: Felix Schlepper <felix.schlepper@cern.ch>
Signed-off-by: Felix Schlepper <felix.schlepper@cern.ch>
Signed-off-by: Felix Schlepper <felix.schlepper@cern.ch>
@github-actions
Copy link
Contributor

github-actions bot commented Aug 5, 2025

REQUEST FOR PRODUCTION RELEASES:
To request your PR to be included in production software, please add the corresponding labels called "async-" to your PR. Add the labels directly (if you have the permissions) or add a comment of the form (note that labels are separated by a ",")

+async-label <label1>, <label2>, !<label3> ...

This will add <label1> and <label2> and removes <label3>.

The following labels are available
async-2023-pbpb-apass4
async-2023-pp-apass4
async-2024-pp-apass1
async-2022-pp-apass7
async-2024-pp-cpass0
async-2024-PbPb-apass1
async-2024-ppRef-apass1
async-2024-PbPb-apass2
async-2023-PbPb-apass5

@f3sch f3sch marked this pull request as ready for review August 5, 2025 12:26
@f3sch f3sch closed this Aug 5, 2025
@f3sch f3sch deleted the its/trklt_fastAtan2 branch August 5, 2025 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant