Skip to content

[SPARK-56832][INFRA] Surface fatal javadoc errors in unidoc log summary and CI annotations#55814

Open
cloud-fan wants to merge 1 commit into
apache:masterfrom
cloud-fan:unidoc-fatal-summary
Open

[SPARK-56832][INFRA] Surface fatal javadoc errors in unidoc log summary and CI annotations#55814
cloud-fan wants to merge 1 commit into
apache:masterfrom
cloud-fan:unidoc-fatal-summary

Conversation

@cloud-fan
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

After the noise filters from #55605, the Documentation generation CI log is around 4K lines on a failure run. The two-line per-file error: reference not found diagnostics are still buried in the middle of the log, and the GitHub Actions check panel for a failed doc-gen job only surfaces Process completed with exit code 1. Reviewers end up scrolling the raw log to find what actually broke.

This PR is purely additive in docs/_plugins/build_api_docs.rb -- no existing log lines are dropped. After the unidoc pipe closes:

  1. A trailing Fatal javadoc errors (N): block is printed, listing each captured diagnostic with file, line, and message.
  2. One ::error file=<path>,line=<line>,title=javadoc::<msg> GitHub Actions workflow command is emitted per diagnostic, so they appear as inline annotations on the PR check panel instead of as a single opaque exit code 1.

Diagnostics are captured strictly within the Standard Doclet phase bracketed by Building tree for all the packages and classes... and Building index for all classes..., which is where doclint emits the build-failing diagnostics that count toward javadoc's exit code. Source-loading error: chatter outside that window is excluded -- it's already non-fatal and matches what javadoc's own N errors summary line counts.

As a self-check, the captured count is compared against javadoc's own N errors summary line. If they diverge -- e.g. because a future JDK changes the Standard Doclet phase wording -- a ::warning:: workflow command is emitted so the drift is surfaced without silently masking real failures.

Why are the changes needed?

PR #55605 made the doc-gen log small enough to read, but the failure path is still discoverable only via grep. The per-file diagnostics emitted by doclint are the actionable content; promoting them to the PR check panel and a clearly delimited summary block makes a doc-gen failure self-explanatory without leaving the PR.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

End-to-end on this branch with deliberately broken references planted in two code paths (mirroring the test pattern from PR #55605):

  • ColumnarMap.java (real Java source): {@link org.apache.spark.deliberately.NoSuchClass} and {@link ColumnVector#nonExistentMethod()}.
  • Partition.scala (Scala source via genjavadoc): [[Partition.index]] -- the .-separator case that javadoc treats as inner-class lookup.

The Documentation generation job will fail with the expected Fatal javadoc errors summary block in the log and per-file inline annotations on this PR's check panel. The plant commit will be dropped before this PR is taken out of draft.

The state machine was also exercised locally against a captured log from a prior failing doc-gen run; the captured fatal count matches javadoc's N errors summary line.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude (Anthropic)

@cloud-fan cloud-fan force-pushed the unidoc-fatal-summary branch from 34bc67f to f1377b0 Compare May 12, 2026 07:36
…ry and CI annotations

After the noise filters from apache#55605, the Documentation generation CI log is
about 4K lines. The two-line per-file fatal diagnostics (`error: reference
not found`) are still buried in the middle of the log and the GitHub Actions
check panel only shows "Process completed with exit code 1", which leaves
reviewers grepping through the raw log to find the actual problem.

This change is purely additive -- it drops no existing log lines. After the
unidoc pipe closes, `build_api_docs.rb` prints a trailing `Fatal javadoc
errors (N):` block listing each captured diagnostic, then emits a
`::error file=,line=::` GitHub Actions workflow command per diagnostic so
they appear as inline annotations on the PR check panel.

Diagnostics are captured strictly within the Standard Doclet phase bracketed
by `Building tree for all the packages and classes...` and
`Building index for all classes...`, which is where doclint emits the
build-failing diagnostics that count toward javadoc's exit code. Source-
loading "error:" chatter outside that window is excluded.

The captured count is cross-checked against javadoc's own `N errors`
summary line. If they diverge -- e.g. because a future JDK changes the
Standard Doclet phase wording -- a `::warning::` workflow command is
emitted so the drift is surfaced without silently masking real failures.

Co-authored-by: Isaac
@cloud-fan cloud-fan force-pushed the unidoc-fatal-summary branch 2 times, most recently from 4bd5ab7 to db6d059 Compare May 12, 2026 12:59
@cloud-fan cloud-fan marked this pull request as ready for review May 12, 2026 12:59
@cloud-fan
Copy link
Copy Markdown
Contributor Author

End-to-end validation result

Validated on the prior branch state with the VALIDATION (DO NOT MERGE) plant commit (now dropped, force-pushed away — branch is now clean at db6d05943a9). The plant carried 3 deliberately broken references; the documentation generation job picked them up as follows.

Summary block in the job log (post-pipe, additive, drops no existing lines):

========================================================================
Fatal javadoc errors (3):
========================================================================
  1. /__w/spark/spark/core/target/java/org/apache/spark/Partition.java:6: reference not found
        * referencing the index field with the wrong separator: {@link Partition.index} --
                                                                       ^
  2. /__w/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarMap.java:26: reference not found
        * a class that does not exist: {@link org.apache.spark.deliberately.NoSuchClass}
                                              ^
  3. /__w/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarMap.java:27: reference not found
        * and a method that does not exist: {@link ColumnVector#nonExistentMethod()}.
                                                   ^
========================================================================

GitHub Actions check panel annotations (one per fatal diagnostic, title=javadoc, multi-line message rendering the error + source snippet + caret):

File Line Message
sql/catalyst/.../ColumnarMap.java 26 reference not found + * a class that does not exist: {@link org.apache.spark.deliberately.NoSuchClass} + ^
sql/catalyst/.../ColumnarMap.java 27 reference not found + * and a method that does not exist: {@link ColumnVector#nonExistentMethod()}. + ^
core/target/java/.../Partition.java 6 reference not found + * referencing the index field with the wrong separator: {@link Partition.index} -- + ^

This replaces the previous user experience where the PR check panel surfaced only Process completed with exit code 1 and the actionable diagnostics required scrolling the raw job log.

Self-check: the captured count (3) matched javadoc's own 3 errors summary line, so no ::warning:: drift signal was emitted. If a future JDK changes the Standard Doclet phase wording, the mismatch would surface that as a workflow warning.

@cloud-fan
Copy link
Copy Markdown
Contributor Author

cc @HyukjinKwon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants