Merged
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
New WAMR public API to copy runtime call stack frames.
CAUTION: this APIs is not thread safe, that's why it's hidden behind feature flag for now. If you need to call it from another thread ensure the passed exec_env is suspended.
Our use case
Sometimes WAMR runtime gets stuck in production and we have no data where in the code compiled to WASM it happens. We currently only track such situations in a separate native thread. To increase visibility into the problem we developed internal solution that requires presence of this API in WAMR. If a separate thread finds that the WASM VM thread has stuck, it interrupts it with a user defined signal and calls this API to collect callstack. The main complexity is maintaining async-signal-safety and avoiding segfaults. For that we're maintaining atomic copies of
exec_env,exec_env->module_inst,exec_env->module_inst->module. Those copies are always set to NULL before the referenced memory is freed. Before a call to this API those copies are always checked for validity. In our use case scenario we guarantee ourselves only absence of crashes but we realize that the frame data that we collect might be invalidated due to a signal interruption. However it's highly unlikely and is not a concern for us.Have we tried existing WAMR APIs for our usecase?
Yes, we've tried suggested by maintainers
wasm_cluster_suspend_threadandwasm_runtime_terminate.wasm_cluster_suspend_threaddoesn't suit us either. Even if it did we'd still need API to iterate over stackframes.