Skip to content

Android: build/runtime fixes for modern Clang + ETDump path override#19293

Open
jgibson2 wants to merge 4 commits intopytorch:mainfrom
PolyCam:jgibson/android-etdump-buildfix
Open

Android: build/runtime fixes for modern Clang + ETDump path override#19293
jgibson2 wants to merge 4 commits intopytorch:mainfrom
PolyCam:jgibson/android-etdump-buildfix

Conversation

@jgibson2
Copy link
Copy Markdown
Contributor

@jgibson2 jgibson2 commented May 5, 2026

Summary

Five small fixes needed to build and run ExecuTorch Android on hosts with Apple Clang 21, on devices where /data/local/tmp isn't app-writable, and to make the Android build script behave correctly on hosts with multiple Python interpreters:

  1. flatcc -Werror on modern Clang. flatcc's vendored CMakeLists.txt hits -Wimplicit-int-conversion-on-negation and -Wunterminated-string-initialization on new Clang. Pass FLATCC_ALLOW_WERROR=OFF into the flatcc_ep ExternalProject so the third-party build doesn't fail.

  2. EXECUTORCH_ANDROID_PROFILING was never forwarded. scripts/build_android_library.sh set EXECUTORCH_ENABLE_EVENT_TRACER from $EXECUTORCH_ANDROID_PROFILING but never forwarded the separate -DEXECUTORCH_ANDROID_PROFILING define that gates ETDumpGen allocation in extension/android/jni/jni_layer.cpp. As a result the event tracer was compiled in but the JNI never attached it, and Module.etdump() always returned false.

  3. Module.etdump(String outputPath) overload. Add a JNI method (etdumpToNative) and Java wrapper so apps can write the ETDump into their own filesDir. The existing etdump() hardcodes /data/local/tmp/result.etdump, which is not app-writable on modern Android (Pixel 9 / API 36 returns EACCES). The new overload factors the write path through a shared etdump_to_path() helper; etdump() now delegates to it with the legacy path.

  4. define_overridable_option was expanding ${DESCRIPTION} unquoted. The set(... CACHE TYPE DOCSTRING [FORCE]) form expects the docstring as a single argument, but ${DESCRIPTION} was unquoted, so any STRING option with a multi-word description that the user overrides on the cmake command line had its description text spill into subsequent set() arguments. One-character fix: quote "${DESCRIPTION}" in both set() calls.

  5. scripts/build_android_library.sh never forwarded PYTHON_EXECUTABLE to cmake. The script defaults PYTHON_EXECUTABLE=python3 and honors a caller override, but only used it for which. cmake's find_package(Python3) then picked up whatever interpreter came first on PATH (e.g. Homebrew's python3 on macOS) instead of the project's torch-equipped venv, failing at find_package_torch_headers with 'NoneType' object has no attribute 'submodule_search_locations'. Pass -DPYTHON_EXECUTABLE="${PYTHON_EXECUTABLE}" so the script's documented behavior actually flows through.

Test plan

  • Apple Clang 21 host build: scripts/build_android_library.sh now succeeds (flatcc no longer fails -Werror).
  • With EXECUTORCH_ANDROID_PROFILING=1: -DEXECUTORCH_ANDROID_PROFILING reaches the JNI compile, ETDumpGen is attached, Module.etdump() returns true and writes a valid .etdump.
  • On Pixel 9 / API 36: Module.etdump(context.getFilesDir() + "/result.etdump") succeeds; legacy Module.etdump() still works on devices where /data/local/tmp is writable.
  • Default build (EXECUTORCH_ANDROID_PROFILING unset): unchanged — etdump() returns false as before.
  • cmake -DEXECUTORCH_VULKAN_FP16_PRECISION=mediump … (multi-word description STRING option) now configures cleanly.
  • Hosts with both Homebrew python3 and a torch-equipped venv: PYTHON_EXECUTABLE=$VENV/bin/python scripts/build_android_library.sh configures correctly.

🤖 Authored with Claude Code

cc @kirklandsign @cbilgin

Three small fixes needed to build and run ExecuTorch Android on hosts with
Apple Clang 21 and on devices where /data/local/tmp isn't app-writable:

1. Workaround flatcc's vendored CMakeLists.txt hitting -Werror on new Clang
   warnings (-Wimplicit-int-conversion-on-negation,
   -Wunterminated-string-initialization). Passes FLATCC_ALLOW_WERROR=OFF
   into both the top-level flatcc CMake build and the flatcc_ep
   ExternalProject.

2. scripts/build_android_library.sh was setting EXECUTORCH_ENABLE_EVENT_TRACER
   from $EXECUTORCH_ANDROID_PROFILING but never forwarding the separate
   -DEXECUTORCH_ANDROID_PROFILING define that gates ETDumpGen allocation in
   extension/android/jni/jni_layer.cpp. As a result, the event tracer was
   compiled in but the JNI never attached it, and Module.etdump() always
   returned false.

3. Add Module.etdump(String outputPath) / etdumpTo() JNI method so apps can
   write the ETDump into their own filesDir. The existing etdump()
   hardcodes /data/local/tmp/result.etdump, which is not app-writable on
   modern Android (Pixel 9 / API 36 returns EACCES). The new overload
   factors the write path through a shared etdump_to_path() helper;
   etdump() now delegates to it with the legacy path.
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 5, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19293

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 2 New Failures, 1 Cancelled Job, 4 Unrelated Failures

As of commit dc1bcb3 with merge base 8a397b4 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 5, 2026
@jgibson2
Copy link
Copy Markdown
Contributor Author

jgibson2 commented May 5, 2026

@pytorchbot label "release notes: android"

@pytorch-bot pytorch-bot Bot added the release notes: android Android Java and JNI code label May 5, 2026
@SS-JIA
Copy link
Copy Markdown
Contributor

SS-JIA commented May 5, 2026

@claude Review these changes

@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Claude finished @SS-JIA's task in 2m 8s —— View job


Code Review for PR #19293

  • Read changed files and gather context
  • Review third-party/CMakeLists.txt (flatcc -Werror fix)
  • Review scripts/build_android_library.sh (profiling flag forwarding)
  • Review extension/android/jni/jni_layer.cpp (JNI etdump changes)
  • Review extension/android/executorch_android/src/main/java/org/pytorch/executorch/Module.java (Java wrapper)
  • Post review summary

Summary

Overall this is a solid, well-scoped PR that fixes real issues. The three changes are logically independent and each solves a clear problem. A few items to consider:


1. third-party/CMakeLists.txt — LGTM

Adding -DFLATCC_ALLOW_WERROR=OFF to the flatcc_ep ExternalProject is the correct approach. This is a third-party build; we shouldn't be blocked by new warnings in vendored code.

Note: I wasn't able to verify that FLATCC_ALLOW_WERROR is an actual option in flatcc's CMakeLists (the submodule isn't checked out in CI), but given the PR's test plan confirms the build succeeds, this looks correct.


2. scripts/build_android_library.sh — LGTM

Forwarding -DEXECUTORCH_ANDROID_PROFILING is necessary because extension/android/CMakeLists.txt:94-98 uses it to both link etdump/flatccrt and define the compile definition. Without this, the event tracer headers are compiled in but the JNI code is #ifdef'd out.


3. extension/android/jni/jni_layer.cpp — Mostly good, a few observations

Positive changes:

  • Adding O_TRUNC to open() — correct fix, prevents stale data if the file already exists.
  • Resource leak fixes: close(etdump_file) and free(etdump_data.buf) on write failure — good catch on the original code's missing cleanup.
  • %zd format for ssize_t — correct.

Potential issues:

  1. Null check on event_tracer() (line 479-480): If module_->event_tracer() returns null for any reason (e.g., tracing was enabled at compile time but the module was created without an ETDumpGen in some code path), the cast + get_etdump_data() call would segfault. Consider a null guard:

    auto* tracer = module_->event_tracer();
    if (!tracer) {
      ET_LOG(Error, "No event tracer attached");
      return false;
    }
    executorch::etdump::ETDumpGen* etdumpgen =
        static_cast<executorch::etdump::ETDumpGen*>(tracer);

    Fix this →

  2. private: / public: visibility toggle (lines 476, 512): The helper is inserted between existing public methods, toggling visibility mid-class. This works but is a bit unusual — consider moving etdump_to_path near other private members or at least removing the extra blank line after public: on line 512 for style consistency.

  3. outputPath null safety (line 473): If Java passes a null String, outputPath->toStdString() will crash via JNI. The Java wrapper currently doesn't null-check outputPath before calling etdumpToNative. Consider adding a null check in either Java or C++.
    Fix this →


4. Module.java — Good

The new etdump(String) overload correctly follows the existing pattern (lock, validity check, delegate to native). The Javadoc is clear and appropriate.

The difference in style from the no-arg etdump() (which is a bare native method without lock/validity check) is worth noting — it means the no-arg version doesn't have the same safety guarantees. But that's pre-existing behavior and not something this PR needs to fix.


Overall Assessment

Approve with minor suggestions. The fixes are correct and well-motivated. The two suggestions above (null guards) are defensive improvements that would prevent potential crashes in edge cases but aren't blockers.


jgibson2 added 2 commits May 5, 2026 10:29
The set(..CACHE TYPE DOCSTRING [FORCE]) form treats the docstring as a
single argument, but `${DESCRIPTION}` was being expanded unquoted, so
multi-word descriptions were spilling their trailing words into
subsequent set() arguments. This is latent for the common case because
most existing STRING options are driven by the else-branch (not overridden
via -D), but any STRING option overridden on the cmake command line with
a multi-word description hits it — for example:

  cmake -DEXECUTORCH_VULKAN_FP16_PRECISION=mediump ...

fails with the description text being fed to the downstream shader
generator as CLI args.

One-char fix: quote "${DESCRIPTION}" in both set() calls. Correct for
both single- and multi-word descriptions.
The script defaults `PYTHON_EXECUTABLE=python3` (and honors any value the
caller sets in the env), but never passed it to the `cmake` invocation. As
a result, cmake's `find_package(Python3)` picked up whatever interpreter
came first on PATH, which on macOS is typically Homebrew's `python3` rather
than the project's torch-equipped venv. The build then failed at
`find_package_torch_headers` (`tools/cmake/Utils.cmake:118`) with:

  AttributeError: 'NoneType' object has no attribute 'submodule_search_locations'

(i.e. `import torch` in the wrong interpreter, returning None.)

One-line fix: pass `-DPYTHON_EXECUTABLE="${PYTHON_EXECUTABLE}"` so the
script's documented behavior — that the caller can pin the interpreter —
actually flows through to the cmake configure step.
* Null-guard module_->event_tracer() before casting to ETDumpGen*. If the
  tracer is unset (e.g. ANDROID_PROFILING was compiled in but the module
  was constructed without an ETDumpGen on some path), we'd previously
  segfault on get_etdump_data(). Switch the C-style cast to a static_cast
  while we're here.
* Move etdump_to_path() into a private: section at the end of the class,
  rather than splicing private:/public: visibility blocks into the middle
  of the public method group.
* Reject null outputPath in etdumpTo() before calling toStdString(); a
  null jstring would otherwise crash.
@jgibson2
Copy link
Copy Markdown
Contributor Author

jgibson2 commented May 5, 2026

Addressed review

Copy link
Copy Markdown
Contributor

@SS-JIA SS-JIA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM, but will defer to @kirklandsign or @psiddh for the final stamp.

@psiddh psiddh self-requested a review May 5, 2026 15:29
@nil-is-all nil-is-all added the module: android Issues related to Android code, build, and execution label May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: android Issues related to Android code, build, and execution release notes: android Android Java and JNI code

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

3 participants