Feature or enhancement
Proposal:
:)
The idea here is to expose JIT internals a bit more in the Tachyon profiler.
The questions that could be answered trivially:
- Is JIT actually used? How often?
- Which executor is hot?
- What change made the largest difference (
diff_flamegraph), in terms of % spent in JIT? Per line?
A bit harder:
- Which UOPs actually eat time?
- Which stensils are good?
- What guards or deopts are bad?
- What is the trace shape?
The very minimal change that I'm thinking about here is just adding --jit flag (active for --jsonl and maybe for --live but it's a catnip), and adding JitInfo to ThreadInfo.
That's how it could look:
JitInfo(
executor_id,
flags
)
This will requires extending debug offsets, obviously.
We could potentially expose native_pc, native_offset, uop_index, exit_index etc. The question would be how to expose metadata for it without balooning JitInfo, and how far we can get without blocking and native unwinding. :)
I'm pretty much open to any other ideas.
There will be definitely some performance impact (more VM reads), but hidden behind the --enable-experimental-jit flag should be bearable, and we could be more playful, going back and forth.
If you think it makes sense, I'd be willing to take a shot, but this will take multiple PRs.
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
No response
Feature or enhancement
Proposal:
:)
The idea here is to expose JIT internals a bit more in the Tachyon profiler.
The questions that could be answered trivially:
diff_flamegraph), in terms of % spent in JIT? Per line?A bit harder:
The very minimal change that I'm thinking about here is just adding
--jitflag (active for--jsonland maybe for--livebut it's a catnip), and addingJitInfotoThreadInfo.That's how it could look:
This will requires extending debug offsets, obviously.
We could potentially expose
native_pc,native_offset,uop_index,exit_indexetc. The question would be how to expose metadata for it without balooningJitInfo, and how far we can get without blocking and native unwinding. :)I'm pretty much open to any other ideas.
There will be definitely some performance impact (more VM reads), but hidden behind the
--enable-experimental-jitflag should be bearable, and we could be more playful, going back and forth.If you think it makes sense, I'd be willing to take a shot, but this will take multiple PRs.
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
No response