Skip to content

[CORE?;KOTLIN]Pre-escaping vs. Template-targeted Escaping in openapi-generator #23962

@Picazsoo

Description

@Picazsoo

The Problem

The codegen Java layer currently bakes language-specific syntax and escaping directly into data model fields (defaultValue, value in enumVars, etc.) before they reach Mustache templates. I think this is the wrong layer for that responsibility. It makes context-specific escaping unnecessarily complex/unpredictable/impossible.


Concrete Examples

1. defaultValue — pre-escaped for the wrong context

AbstractKotlinCodegen.toDefaultValue() and AbstractJavaCodegen.toDefaultValue() return strings like:

  • "\"hello\"" — raw value already wrapped in language string-literal quotes
  • "42l" — Java long literal suffix baked in
  • "new BigDecimal(\"3.14\")" — full constructor expression
  • "URI.create(\"...\")" — static factory call

The template then receives a code-ready expression, not a value. This makes it impossible for the template to render the same value in a different context (e.g. annotation attribute vs. field initializer vs. comment) without getting double-escaped or incorrectly escaped output.

2. value in enumVars — same pattern, fragile workaround

AbstractKotlinCodegen.toEnumValue() returns "\"available\"" for string types — i.e. the value already includes the surrounding Kotlin string-literal quotes. In kotlin-client/enum_class.mustache, {{{value}}} is used in two contexts with conflicting needs:

  • Enum constructor (line 93): {{name}}({{{value}}})AVAILABLE("available") — the pre-quoted form works here by accident
  • Annotations (lines 65, 68, 71, 75, 81): @SerializedName(value = {{#lambda.doublequote}}{{{value}}}{{/lambda.doublequote}})@SerializedName(value = "available") — this works, but only because DoubleQuoteLambda detects the value is already quoted and passes it through unchanged (i.e. it is a no-op here)

The annotation lines only produce correct output because the lambda happens to be idempotent for already-quoted input. If the template ever needs the raw value (e.g. in a Javadoc comment or a non-string context), there is no way to get it — there is no unescapedValue counterpart for enumVars.

3. DoubleQuoteLambda — a symptom, not a solution

Because some codegens pre-quote defaultValue (for string types) and others don't (for numeric types), the {{#lambda.doublequote}} lambda was introduced to normalize "add quotes unless already present." This is fundamentally a state-detection workaround, not a principled design: the template has to guess whether the Java layer already applied quoting.

4. unescapedDefaultValue — acknowledgment of the problem

The existence of a parallel unescapedDefaultValue field (set from schema.getDefault() directly) shows the codebase already recognizes that defaultValue is "too processed" for some uses — but this is a workaround, not a fix.


Why Pre-escaping Is Wrong

Escaping is context-dependent:

Context String hello's needs
Kotlin/Java string literal "hello's"
Single-quoted annotation 'hello\'s'
Kotlin multiline string """hello's"""
JSON value "hello's"
XML attribute hello's
Single-line comment hello's (no escaping)
URL hello%27s

When a value is pre-escaped in Java for one assumed context, it:

  • Cannot be reused for other contexts without double-escaping
  • Requires detection hacks (like DoubleQuoteLambda) to "un-guess" whether quoting was applied
  • Breaks cross-cutting uses (same field in a comment, a string literal, and an annotation in the same template)

Security: Pre-escaping Creates Injection Vulnerabilities

Pre-escaping for one assumed context actively undermines safe handling in other contexts. Because the template author cannot tell what escaping has already been applied, they face a dilemma: apply a sanitizing lambda and risk double-escaping, or skip it and risk an injection. This creates a class of vulnerabilities in generated code:

Kotlin string template injection ($)

Kotlin string literals treat $ as the start of a string interpolation ($variable, ${expression}). The escapeText method used during pre-escaping does not escape $ — that is handled separately by lambda.escapeDollar. A pre-escaped value like "hello $world" stored as "\"hello $world\"" will compile to an interpolated string referencing the variable world, rather than the literal text $world. A template author using the value in a different context (e.g. a multiline string or a comment) may assume escaping was already handled and skip lambda.escapeDollar, leaving the interpolation active.

Premature termination of triple-quoted strings (""")

Kotlin multiline strings are delimited by """. A value containing """ (e.g. a description or default value from a spec) would prematurely close the string, injecting arbitrary content outside it. Pre-escaping with escapeText targets regular string literals (\") — it does not produce the ${"\"\"\""} construct required to safely embed triple-quotes inside a multiline string. A template author reusing a pre-escaped value in a """...""" context has no safe path: the escaping that was applied is wrong for this context, and applying the right escaping on top would double-escape everything else.

General principle

The root issue is that a template author cannot reason safely about a value whose escaping state is unknown. With raw values and explicit lambdas, the contract is clear: the value is always unescaped, and the template applies exactly the lambdas required for the target context — no guessing, no double-escaping, no missed injection vectors.


The Correct Contract

The Java codegen layer stores raw semantic values. Mustache templates are solely responsible for context-appropriate escaping via lambdas.

// Java layer — raw value only:
enumVar.put("value", "available");        // not "\"available\""
property.defaultValue = "hello world";   // not "\"hello world\""

// Template layer — escaping is explicit and context-targeted:
{{name}}({{#lambda.kotlinString}}{{{value}}}{{/lambda.kotlinString}})   // → AVAILABLE("available")
@SerializedName(value = "{{value}}")                                    // → @SerializedName(value = "available")
// default: {{defaultValue}}                                            // → // default: hello world
@DefaultValue("{{#lambda.escapeInNormalString}}{{{defaultValue}}}{{/lambda.escapeInNormalString}}")

Benefits of the Change

  1. Correctness — eliminates the ""available"" class of bugs where two layers both add quotes
  2. No more DoubleQuoteLambda — it becomes unnecessary; the template always knows whether it's adding quotes
  3. No more unescapedDefaultValuedefaultValue is already raw; the parallel field disappears
  4. Predictability — any contributor reading a template knows exactly what they're getting: raw values, and explicit lambdas for escaping
  5. Security — template authors can apply exactly the right escaping for each context without ambiguity
  6. Extensibility — adding a new target language/context just means adding a new lambda, not forking toDefaultValue() overrides across dozens of codegen subclasses

Migration Considerations

This is a breaking change for custom templates. The migration path would be:

  • Deprecate the pre-escaped behavior with a flag
  • Add a rawValue / rawDefaultValue field alongside the existing ones as a transition bridge
  • Update all bundled templates to use explicit lambdas
  • Remove the pre-escaped fields in a future major version

I am willing to try to tackle this in kotlin codegens

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions