Skip to content

NEW @W-21102140@ - Add PMD AST dump API for programmatic AST generation #434

Merged
aruntyagiTutu merged 11 commits intodevfrom
feature/pmd-ast-dump
Mar 11, 2026
Merged

NEW @W-21102140@ - Add PMD AST dump API for programmatic AST generation #434
aruntyagiTutu merged 11 commits intodevfrom
feature/pmd-ast-dump

Conversation

@aruntyagiTutu
Copy link
Contributor

@aruntyagiTutu aruntyagiTutu commented Mar 10, 2026

@W-21102140@

Summary

Adds new AST (Abstract Syntax Tree) dump functionality to the PMD engine, enabling programmatic generation of XML AST representations for Apex and Visualforce code without requiring PMD CLI.

What's New

  • New Public API: PmdEngine.generateAst() method for on-demand AST generation
  • Java Implementation: New ast-dump command in PmdWrapper using PMD's TreeExporter API
  • Type Safety: Full TypeScript type definitions (PmdAstDumpResults, GenerateAstOptions)

Key Features

  • Supports Apex and Visualforce languages
  • Configurable file encoding (defaults to UTF-8)
  • Comprehensive error handling with structured error objects
  • Temporary working folder management with automatic cleanup
  • Memory-efficient validation (no unnecessary file loading)

Changes Made

  1. Java layer (pmd-cpd-wrappers):
    - PmdAstDumper.java - Core AST generation logic
    - PmdAstDumpInputData.java - Input structure
    - PmdAstDumpResults.java - Output structure
    - Updated PmdWrapper.java - Added "ast-dump" command handler
  2. TypeScript layer:
    - Added generateAst() method to PmdEngine class
    - Exported PmdEngine for direct instantiation (needed for MCP provider)
    - Type exports for public API consumption
  3. Tests:
    - 18 Java unit tests (100% coverage)
    - 13 TypeScript integration tests
    - Edge cases: validation, encodings, error handling, empty files

API Example

const pmdEngine = new PmdEngine(config);
const result = await pmdEngine.generateAst('apex', '/path/to/file.cls', {
encoding: 'UTF-8',
workingFolder: '/tmp/ast'
});
// result.ast contains XML AST or result.error contains error details

aruntyagiTutu and others added 8 commits March 2, 2026 11:58
- Add 11 new tests covering all validation logic and edge cases
- Test null/empty language and fileToDump validation
- Test encoding defaults to UTF-8 when null/empty
- Test invalid encoding, directory instead of file
- Test empty files and different encodings (ISO-8859-1)
- All 18 tests passing with 100% code coverage
- Remove implementation plan document as feature is complete
PROBLEM:
- PmdAstDumper.readFileContent() loaded entire file into memory
- Return value was discarded - only used for validation
- Could cause OutOfMemoryError with large files (>100MB)
- PMD's TreeExporter already reads the file internally

SOLUTION:
- Replace readFileContent() with validateFilePath()
- Only check file exists, is regular file, and encoding is valid
- No longer loads file content into memory
- Lightweight validation without memory overhead

IMPACT:
- Prevents OOM errors with large Apex/Visualforce files
- Same validation behavior, better performance
- All 69 tests pass (18 AST dump + 51 other tests)
PROBLEM:
- PmdEngine and CpdEngine were exported as public API in index.ts
- These are internal implementation classes that should not be directly accessed
- Exposing them creates unwanted API surface and support burden
- Users could bypass the plugin system by directly instantiating engines
- Future internal changes would become breaking changes

SOLUTION:
- Remove class exports from index.ts (lines 14-15)
- Keep only type exports: PmdAstDumpResults, GenerateAstOptions, PmdProcessingError
- Users access engines through PmdCpdEnginesPlugin (correct pattern)
- Engines remain accessible internally for testing

IMPACT:
- Cleaner public API with minimal surface area
- Users must use plugin system (intended design pattern)
- Internal implementation can evolve without breaking changes
- All 127 TypeScript tests pass with 98.84% coverage
- Type exports still available for consumers of AST dump functionality
CONTEXT:
- Previous commit removed PmdEngine export to keep it internal
- MCP provider (internal Salesforce tool) needs direct access to PmdEngine
- They use it to call generateAst() API for AST XML generation

PROBLEM:
- MCP provider code broke: "Module has no exported member 'PmdEngine'"
- They instantiate PmdEngine directly: new PmdEngine(config)
- Then call generateAst() method for on-demand AST generation

SOLUTION:
- Re-export PmdEngine from index.ts
- Add clear documentation: use for AST generation, prefer plugin for normal usage
- Keep CpdEngine internal (not needed by consumers)
- Document that direct instantiation is for specialized use cases

RATIONALE:
- generateAst() is a valid public API use case
- MCP provider is internal Salesforce code, not third-party
- Alternative (factory function) adds unnecessary complexity
- Clear documentation guides proper usage

IMPACT:
- MCP provider builds successfully
- All 127 tests pass with 98.84% coverage
- API surface: minimal (only PmdEngine + types, not CpdEngine)
@aruntyagiTutu aruntyagiTutu changed the title NEW @W-21102140@ - PMD ast dump java librarry NEW @W-21102140@ - Add PMD AST dump API for programmatic AST generation Mar 10, 2026
results.file = inputData.fileToDump;

try {
System.out.println("Generating AST for file '" + inputData.fileToDump + "' with language '" + inputData.language + "'");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to log this instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are intentionally using System.out.println here to maintain consistency
with the rest of the codebase (see PmdRuleDescriber.java:99 and
CpdRunner.java:55). The project uses slf4j-nop and relies on stdout for
inter-process communication with the TypeScript side.

am I missing something @namrata111f ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, was not aware that inter-process communication relies on stdout. Just curious for debugging do we log these stdout on the TypeScript side?

@aruntyagiTutu aruntyagiTutu merged commit 3cbb9b3 into dev Mar 11, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants