Skip to content

ETDump adds runtime overhead by serializing the flatbuffer during inference #20467

Description

@rascani

🚀 The feature, motivation and pitch

Problem

ETDump builds its output flatbuffer inline, while inference is running. Every event that gets traced (op profiling, intermediate-output logging, allocations) drives the flatbuffer builder synchronously, which includes building tables, pushing size/stride vectors, interning strings, etc. That serialization work is paid per event, on the critical path, so enabling ETDump adds significant latency to the run. When doing per-op profiling, this can be subtracted out in the inspector post-processing, but it can give a misleading indicator of "framework tax." Ie, how much latency does the framework itself add to a run? Some users wind up profiling models once with ETDump enabled and once without, just to be able to get both E2E & per-op profiling.

Proposed idea

Decouple data collection from serialization. During inference, record events into in-memory objects (cheap appends, no flatbuffer work). Then, after inference completes (or rather, when user requests the data), walk those collected objects once and serialize the flatbuffer in a single pass. This keeps the flatbuffer format and downstream tooling unchanged. The downside to this approach is that it will increase the memory needs, but that may be worth the trade off for some. I'd recommend that this be an alternative implementation of ETDumpGen rather than a replacement.

Alternatives

No response

Additional context

No response

RFC (Optional)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions