Skip to content

Audit and unify usage of alias vs validation_alias in Pydantic models #807

@vdusek

Description

@vdusek

Problem

The codebase uses alias= and validation_alias= in Pydantic Field() definitions inconsistently. These have different semantics:

  • alias — affects both serialization (model_dump(by_alias=True)) and validation (input parsing). The field serializes under the alias name, not the Python field name.
  • validation_alias — affects only validation (input parsing). The field still serializes under its Python name.

Currently the choice between them appears accidental rather than intentional.

Where

src/apify/_configuration.py

This is the most problematic file. It mixes both patterns:

Fields using validation_alias=AliasChoices(...) (correct for env var parsing — multiple legacy names, no serialization impact):

  • actor_id, actor_run_id, default_dataset_id, default_key_value_store_id, default_request_queue_id, input_key, started_at, timeout_at, token, api_base_url, etc.

Fields using alias='...' (single env var, but also changes serialization name):

  • fact, is_at_home, proxy_hostname, proxy_password, proxy_port, proxy_status_url, max_paid_dataset_items, max_total_charge_usd, test_pay_per_event, meta_origin, metamorph_after_sleep, log_format, disable_outdated_warning, input_secrets_private_key_file, input_secrets_private_key_passphrase, charged_event_counts, actor_pricing_info, etc.

The consequence: config.model_dump(by_alias=True) would serialize is_at_home as "apify_is_at_home" but started_at as "started_at". This inconsistency also affects get_env() in _actor.py (line 816-827), which has to handle both paths with branching logic.

src/apify/_models.py, src/apify/events/_types.py, src/apify/storage_clients/_apify/_models.py

These consistently use alias= for camelCase mapping (e.g., Field(alias='memAvgBytes')). This is correct for API response/request models that need round-trip serialization with camelCase keys.

Suggested approach

  1. Configuration fields — For fields that only need env-var-based input parsing, switch from alias= to validation_alias=. For single-alias fields, validation_alias='env_var_name' is sufficient (no AliasChoices needed when there's only one name).

  2. API models — Keep alias= as-is. These models deserialize from and serialize to JSON with camelCase keys, so alias (affecting both directions) is the right choice.

  3. Document the convention — Add a brief comment or note in the codebase (e.g., in CLAUDE.md or as a module-level comment) stating:

    • Use validation_alias for Configuration fields (env var parsing only)
    • Use alias for API/event models (camelCase round-trip serialization)
  4. Review get_env() — After unifying, the branching logic in _actor.py:820-827 can potentially be simplified.

Context

Noticed during review of #797, which adds a new actor_storages field using alias='actor_storages_json'.

Metadata

Metadata

Assignees

No one assigned

    Labels

    t-toolingIssues with this label are in the ownership of the tooling team.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions