From dc498d23790d8998338ba282579addd4d5e044d4 Mon Sep 17 00:00:00 2001 From: Jarek Potiuk Date: Fri, 29 May 2026 22:44:36 +0200 Subject: [PATCH] Add draft project security threat-model document MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds a draft project-level security threat-model document (draft-THREAT-MODEL.md) at repo root, improving discoverability for automated security scanners running against this repository. The file follows the rubric format used by several other ASF projects piloting security-model discoverability. The "draft-" prefix signals this is a proposal for the PMC to review, correct, or reject — not a finalised maintainer-blessed model. Every claim carries a provenance tag (documented / inferred / maintainer) so reviewers can see where each claim originates; §14 collects open questions for the maintainers. Co-Authored-By: Claude Opus 4.7 (1M context) --- draft-THREAT-MODEL.md | 884 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 884 insertions(+) create mode 100644 draft-THREAT-MODEL.md diff --git a/draft-THREAT-MODEL.md b/draft-THREAT-MODEL.md new file mode 100644 index 0000000000..684ba03968 --- /dev/null +++ b/draft-THREAT-MODEL.md @@ -0,0 +1,884 @@ + + +# Apache Impala Security Threat Model + +## §1 Header + +- **Project**: Apache Impala — distributed massively-parallel C++ SQL query engine for + data in HDFS, Apache Iceberg, Apache Kudu, Apache HBase, Amazon S3, Azure Data Lake + Storage, Apache Ozone, and other Hadoop-compatible storage *(documented: `README.md`)*. +- **Version / commit**: this model is drafted against the default branch (`master`), + most recently HEAD `b8be513` ("IMPALA-13033: Parse WebUI thrift profile downloads"). + A report against project release *N* should be triaged against the model as it stood + at *N*, not at HEAD. +- **Date**: 2026-05-29. +- **Authors**: ASF Security team draft, awaiting Impala PMC review. +- **Status**: draft — under maintainer review. +- **Reporting**: vulnerabilities that fall under §8 (claimed properties) should be + reported per the Apache Security Team disclosure channel + (); reports that fall under §3 (out of scope), §9 + (properties not provided), or §11a (known non-findings) will be closed by Impala + triagers citing this document. +- **Provenance legend** — + *(documented)* = drawn from in-repo docs or website docs, with citation; + *(maintainer)* = stated by an Impala maintainer in response to this draft; + *(inferred)* = synthesized by the producer from code structure or domain + knowledge, awaiting PMC ratification (every *(inferred)* tag has a matching + §14 question). +- **Draft confidence**: 47 documented / 0 maintainer / 28 inferred. + +Impala is a MPP SQL engine: clients submit SQL over the HiveServer2 (HS2) Thrift +protocol or HS2-over-HTTP; the coordinator `impalad` parses, plans, and distributes +query fragments to worker `impalad` instances; metadata is served by a central +`catalogd` and propagated to workers via `statestored`; data is read and written +directly from/to the underlying storage (HDFS, S3, ADLS, Ozone, Kudu, HBase) using +the impersonated impala-process credentials. Authentication is via Kerberos, LDAP, +SAML, JWT, or OAuth bearer token; authorization is delegated to Apache Ranger. + +## §2 Scope and intended use + +### Intended use + +- Production analytic SQL queries against tabular data residing in distributed + storage, served to authenticated end users via JDBC/ODBC clients, + `impala-shell`, BI tools, or Apache Hue *(documented: `README.md`, + `docs/topics/impala_security.xml`)*. +- Multi-tenant analytic clusters where authorization is enforced by Apache Ranger + and authentication by Kerberos and/or LDAP *(documented: + `docs/topics/impala_security.xml` lines 38–62)*. + +### Deployment shape + +Impala is **not** an in-process library and is **not** a single-binary daemon. It +is a cluster of cooperating processes, deployed by an operator inside a network +perimeter the operator controls. The threat model is therefore that of a +distributed service, not a library *(inferred — §14 Q1)*. + +### Caller roles + +Following §2 of the output-structure rubric (network service split): + +| Role | Trust level | Notes | +| --- | --- | --- | +| **End-user client** | untrusted but authenticated | Connects via HS2 / HS2-HTTP / Beeswax; identity verified by Kerberos, LDAP, SAML, JWT, or OAuth *(documented: `docs/topics/impala_security.xml`, `docs/topics/impala_ldap.xml`, `be/src/rpc/authentication.cc`)*. | +| **Operator / cluster admin** | trusted | Sets startup flags, manages keytabs, configures Ranger, owns the Web UI `.htpasswd` *(documented: `docs/topics/impala_security_guidelines.xml`)*. | +| **Internal Impala peer** | trusted (mutually-authenticated) | `impalad`↔`statestored`↔`catalogd` RPC; KRPC + Thrift RPC; auth is Kerberos-only between internal components *(documented: `docs/topics/impala_ldap.xml`, "Consideration for Connections Between Impala Components")*. | +| **Hive Metastore / Ranger** | trusted control plane | Source of metadata + policy decisions; assumed honest *(inferred — §14 Q2)*. | +| **Underlying storage** | trusted by virtue of operator-granted credentials | HDFS / S3 / ADLS / Ozone / Kudu / HBase; Impala holds delegation tokens or static credentials and reads/writes as the `impala` Unix user (or impersonated user when Ranger is enabled) *(documented: `docs/topics/impala_security_files.xml`)*. | +| **Delegated proxy user** | conditionally trusted | When `--authorized_proxy_user_config` is set, an authenticated front-end (Hue, BI tool) may forward queries as a different end user *(documented: `docs/topics/impala_delegation.xml`)*. | + +### Component-family table + +| Family | Representative entry point | Touches outside the process? | In-model? | +| --- | --- | --- | --- | +| HS2 / Beeswax / HS2-HTTP server | `:21000` (Beeswax), `:21050` (HS2 binary), `:28000` (HS2-HTTP) *(documented: `docs/topics/impala_ports.xml`)* | network (TCP, optionally TLS) | **yes** | +| Internal KRPC + Thrift RPC | `:27000` (KRPC), `:23000`/`:24000`/`:26000` (statestore/catalog) | network within the cluster | **yes** (peer trust depends on Kerberos / mTLS) | +| Web UI / metrics / `/admin/` endpoints | `:25000`/`:25010`/`:25020` | network (TCP, optionally TLS + SPNEGO + `.htpasswd`) | **yes** | +| Query frontend (Java, `fe/`) — parser, analyzer, planner, Ranger checker | invoked from coordinator `impalad` via JNI | none directly | **yes** | +| Query backend (C++, `be/`) — exec engine, codegen (LLVM), expression eval, scanners | invoked from coordinator `impalad` and worker `impalad` | reads/writes storage with process credentials | **yes** | +| Catalog server (`catalogd`) | invoked by coordinators; reads HMS + storage | reads HMS, lists HDFS, reads object stores | **yes** | +| Storage scanners (Parquet, ORC, Avro, text, Iceberg, Kudu, HBase, JDBC external table) | reads operator-configured locations | reads object stores / HDFS / external JDBC | **yes** (data trust = §6) | +| User-defined functions (UDFs) | `CREATE FUNCTION … LOCATION …` (C++ native or Java) | runs operator-/user-permitted binaries in-process | **out of model** for UDF code itself *(§3)*; in-model for the privilege check that admits the UDF | +| External Data Sources / JDBC external tables | `CREATE DATA SOURCE` / Iceberg REST catalog | outbound JDBC / HTTPS *(documented: `docs/topics/impala_jdbc_external_table.xml`, `docs/topics/impala_iceberg_rest_catalog.xml`)* | in-model for credential handling; out-of-model for the remote endpoint | +| `ai_generate_text` LLM connector | SQL function calling external LLM endpoint *(documented: `docs/topics/impala_ai_functions.xml`)* | outbound HTTPS | in-model for credential / prompt handling; out-of-model for the LLM provider | +| `shell/` Python impala-shell | client-side, not server | n/a | **out of model** for server claims *(§3)*; in-model for credential handling of the shell binary itself | +| `docker/`, `testdata/`, `infra/`, `tests/` | tooling | n/a | **out of model** *(§3)* | +| Vendored Kudu security code under `be/src/kudu/security/` and `security/tls_socket-test.cc` | TLS/SASL primitives shared with the Apache Kudu codebase | n/a | in-model only insofar as Impala calls into it *(inferred — §14 Q3)* | + +## §3 Out of scope (explicit non-goals) + +Impala is not, and does not aim to be, the following — reports requiring any of +these will be closed with the cited disposition: + +1. **The root authority for storage-level authorization.** HDFS POSIX + permissions, S3 IAM, ADLS RBAC, Ozone ACLs, etc. are enforced by the storage + provider and the credentials the operator hands to Impala. Reports that depend + primarily on over-broad bucket / IAM permissions are deployment-sensitive, not + Impala-side *(documented: `docs/topics/impala_security_files.xml`)*. → + `OUT-OF-MODEL: adversary-not-in-scope`. +2. **A defender against a malicious Hive Metastore, Ranger Admin, or other + trusted control-plane component.** If the report requires the HMS or Ranger + to be hostile to Impala, it is out of model *(inferred from §2 trust table — + §14 Q2)*. → `OUT-OF-MODEL: trusted-input`. +3. **A defender against the operator.** Anyone with `root`, `sudo`, the + `impala` Unix account, the keytab file, the cookie-secret file, or the Web + UI `.htpasswd` already has unbounded power; "the operator misconfigured X" is + not a vulnerability *(documented: + `docs/topics/impala_security_guidelines.xml`)*. → `OUT-OF-MODEL: + adversary-not-in-scope`. +4. **An isolation boundary between an authorized user's SQL and the + `impalad` process.** SQL is interpreted by a trusted engine running as the + `impala` Unix user; an authenticated user with appropriate SQL privileges + can already cause arbitrary reads, writes, and resource consumption within + the scope Ranger grants. A new way for an authorized user to do something + they are already authorized to do is not a vulnerability *(inferred — + §14 Q4)*. → `OUT-OF-MODEL: equivalent-harm`. +5. **A sandbox for user-defined functions.** Native (C++) UDFs and Hive Java + UDFs run in-process with the privileges of the `impalad` daemon. UDF + sandboxing is not provided; admission of a `CREATE FUNCTION` is gated by + Ranger and that is the entire enforcement *(inferred from `docs/topics/impala_udf.xml` + — §14 Q5)*. → `BY-DESIGN: property-disclaimed` (§9). +6. **A defender against malformed-but-parseable user data in scanned files.** + Decoders (Parquet, ORC, Avro, text, Iceberg manifests) must not corrupt + process memory, but raw runtime exceptions, slow paths on adversarial + inputs, and OOM on pathological files are robustness work, not security + issues, unless they cross a trust boundary *(inferred — §14 Q6)*. → + `OUT-OF-MODEL: equivalent-harm` for writer-controlled files, + `VALID-HARDENING` for reader-controlled files. +7. **Code that ships but is not part of the supported product:** + `tests/`, `testdata/`, `infra/`, `docker/`, `package/`, `ssh_keys/`, + `cmake_modules/`, `experiments/`, `udf_samples/`. State the policy + explicitly so integrators do not extend core guarantees to them + *(inferred — §14 Q7)*. → `OUT-OF-MODEL: unsupported-component`. +8. **Apache Kudu, Apache Iceberg, Apache Ranger, Apache Hive client libraries, + Hadoop libraries, OpenSSL, Apache Thrift, and other upstream dependencies.** + Where Impala vendors source (e.g. `be/src/kudu/`), the vendored code is + modeled at the wrapper boundary; vulnerabilities intrinsic to the upstream + project should be reported upstream *(inferred — §14 Q3)*. → + `OUT-OF-MODEL: unsupported-component` (with an upstream pointer). +9. **The Impala documentation site, asf-site branch, downloads page, gem/npm + packages with similar names, and other non-product surfaces.** Out of scope. + +## §4 Trust boundaries and data flow + +Impala has at least eight distinct trust transitions; a finding is in-model +only when it cleanly maps to one of them. + +| # | Transition | Authentication | Authorization | +| --- | --- | --- | --- | +| B1 | End-user client → HS2 / Beeswax / HS2-HTTP | Kerberos / LDAP / SAML / JWT / OAuth / trusted-domain header *(documented: `be/src/rpc/authentication.cc`)* | Ranger on submitted SQL *(documented: `docs/topics/impala_authorization.xml`)* | +| B2 | End-user client → Web UI (`:25000` and siblings) | `.htpasswd` + SPNEGO, optional TLS *(documented: `docs/topics/impala_security_webui.xml`)* | none beyond authentication; the Web UI exposes operator-grade endpoints *(inferred — §14 Q8)* | +| B3 | `impalad` ↔ `statestored` ↔ `catalogd` internal RPC (KRPC + Thrift) | Kerberos (mandatory for prod) + optional TLS *(documented: `docs/topics/impala_ssl.xml`, `docs/topics/impala_ldap.xml`)* | "internal_principals_whitelist" of allowed principals *(documented: `be/src/rpc/authentication.cc` line 121)* | +| B4 | Coordinator `impalad` → worker `impalad` (query fragments over KRPC) | same as B3 | same as B3 | +| B5 | Coordinator / catalogd → Hive Metastore | Kerberos / delegation token | HMS-side; Impala assumes truthful responses | +| B6 | Coordinator → Ranger Admin (policy fetch) | service principal | Ranger-side | +| B7 | Worker `impalad` → underlying storage (HDFS, S3, ADLS, Ozone, Kudu, HBase) | Kerberos / IAM / service-account keys / delegation tokens | storage-side ACLs | +| B8 | Operator → impalad startup flags + configuration files | filesystem permissions on the host | OS-level | + +### Reachability preconditions per family + +For each family in §2, a finding is in-model only if it is reachable as +follows: + +- **HS2 / Beeswax / HS2-HTTP server**: reachable from an *unauthenticated* network + peer who can reach the listening port. Findings that require an already- + authenticated peer collapse to "authenticated user with SQL privileges", and + must additionally clear B7 (storage ACL) or B6 (Ranger policy) to be + security-relevant. +- **Internal KRPC**: reachable from a network peer who has compromised the + Kerberos trust (B3) — i.e., has stolen a service keytab or impersonated a + principal on `internal_principals_whitelist`. A flat "internal RPC has no + auth" finding is `OUT-OF-MODEL: adversary-not-in-scope` because the model + *requires* Kerberos between components in production *(documented: `docs/topics/impala_ldap.xml`)*. +- **Web UI**: reachable from a network peer with an `.htpasswd` credential + (per §10) or who can reach an unprotected port. A finding that + needs `.htpasswd` to be absent is `OUT-OF-MODEL: trusted-input` against a + guideline-violating operator (§3 item 3) *(inferred — §14 Q8)*. +- **Query frontend / backend**: reachable from SQL submitted by an authenticated + user with sufficient Ranger privileges. Findings here matter only if they + break out of the user's Ranger-granted privilege set. +- **Scanners**: reachable from bytes in operator-configured storage locations. + Bytes are *partially* attacker-controlled when an authorized writer has + `INSERT` privilege on a table that other users read (B7). Compromise of the + storage layer itself is out of model (§3 item 1). +- **UDFs**: reachable only via `CREATE FUNCTION`, which is Ranger-gated. + Anything past the privilege check is out of model (§3 item 5). + +## §5 Assumptions about the environment + +- **Operating system**: Linux (Ubuntu 16.04/18.04, CentOS/RHEL 7/8 are the + declared supported set; others "may also be supported but are not tested by + the community") *(documented: `README.md` — Supported Platforms)*. x86_64 + primary, arm64 experimental. +- **Process model**: at least three long-lived daemons (`impalad`, + `statestored`, `catalogd`); operator runs them as the `impala` Unix user + *(documented: `docs/topics/impala_security_files.xml`)*. +- **Network**: operator-controlled L2/L3; no NAT or middlebox assumed to + inspect KRPC payloads; ports per `docs/topics/impala_ports.xml`. Mutually- + reachable cluster members assumed. +- **Time**: Kerberos requires loosely-synchronized clocks across the realm + (KDC tolerance, default 5 min) — operator's responsibility, not Impala's + *(inferred — §14 Q9)*. +- **Filesystem**: keytab and `.htpasswd` files have OS-level permissions + restricted to the `impala` user and admins *(documented: + `docs/topics/impala_security_webui.xml`)*. +- **Cryptography**: the OpenSSL library shipped with the OS provides TLS, + symmetric/asymmetric primitives, and RNG *(documented: `EXPORT_CONTROL.md`)*. +- **Kerberos**: assumes a working MIT KDC with renewable-ticket support + configured per `docs/topics/impala_kerberos.xml`. +- **What Impala does to its host** (negative claims, awaiting maintainer ratification): + - **does** open listening sockets on the documented ports; + - **does** spawn no child processes other than codegen-compiled native code + via LLVM in-process *(inferred — §14 Q10)*; + - **does** install no signal handlers besides those required for crash + reporting (breakpad) *(inferred — §14 Q10)*; + - **does** read a documented set of environment variables (e.g. + `IMPALA_HOME`, `JAVA_HOME`) but does not consume arbitrary `LD_*` for + security-sensitive behavior *(inferred — §14 Q10)*; + - **does** write logs to operator-configured locations; redacted query text + if log redaction is enabled *(documented: + `docs/topics/impala_logging.xml`)*. + +## §5a Build-time and configuration variants + +Impala ships as a single product but a sizable number of runtime flags +materially change the security envelope. The maintainer-confirmed list is at +`be/src/rpc/authentication.cc` and equivalent files; the security-relevant +subset: + +| Flag | Default | Maintainer stance | Effect | +| --- | --- | --- | --- | +| `--enable_ldap_auth` | `false` *(documented)* | dev/test, operator must enable per §10 *(inferred — §14 Q11)* | enables LDAP auth on HS2 client port | +| `--ssl_server_certificate` / `--ssl_private_key` | unset *(documented)* | dev/test, operator must enable per §10 *(inferred — §14 Q11)* | enables TLS on all listening sockets | +| `--ssl_minimum_version` | `tlsv1.2` *(documented: `docs/topics/impala_ssl.xml`)* | hardened in Impala 4.0 from `tlsv1` *(documented)* | rejects pre-1.2 handshakes | +| `--webserver_password_file` | unset *(documented: `docs/topics/impala_security_webui.xml`)* | **maintainer ruling required**: is an unprotected Web UI a `VALID` report or `OUT-OF-MODEL: non-default-build`? *(inferred — §14 Q12)* | Web UI authenticates against this `.htpasswd` | +| `--webserver_certificate_file` | unset *(documented)* | dev/test, operator must enable per §10 *(inferred — §14 Q11)* | enables HTTPS on Web UI | +| `--principal`, `--keytab-file` | unset | dev/test, operator must enable per §10 *(inferred — §14 Q11)* | enables Kerberos auth | +| `--authorization_provider=ranger` | unset *(documented: `docs/topics/impala_authorization.xml`)* | dev/test, operator must enable per §10 *(inferred — §14 Q11)* | enables Ranger authz; absent → all queries run as `impala` user (no enforcement) | +| `--jwt_token_auth` / `--oauth_token_auth` | `false` | optional alternative auth *(documented: `be/src/rpc/authentication.cc`)* | enables bearer-token auth | +| `--jwt_validate_signature`, `--oauth_jwt_validate_signature` | `true` | hardened default; flipping to `false` voids §8 P3 *(inferred — §14 Q13)* | turns off JWT/OAuth signature check | +| `--jwt_allow_without_tls`, `--oauth_allow_without_tls`, `--saml2_allow_without_tls_debug_only` | `false`, marked `_hidden` | "debug only" per name *(inferred — §14 Q13)* | permits bearer / SAML auth over unencrypted transport | +| `--trusted_domain`, `--trusted_auth_header` | unset *(documented)* | when set, Impala accepts identity assertions from named peer without re-auth | reachability for `OUT-OF-MODEL: trusted-input` reports | +| `--trusted_domain_use_xff_header` | `false` | when `true`, parses `X-Forwarded-For` to identify the originating client *(documented: `be/src/rpc/authentication.cc` line 132)* | exposes a path where a misconfigured proxy can let a client claim any source address *(inferred — §14 Q14)* | +| `--internal_principals_whitelist` | `hdfs` *(documented: `be/src/rpc/authentication.cc` line 121)* | governs which Kerberos principals are accepted on internal RPC ports | misconfiguration permits external service to speak as a peer | +| `--authorized_proxy_user_config` / `--authorized_proxy_group_config` | unset *(documented: `docs/topics/impala_delegation.xml`)* | required for Hue-style impersonation; whitelists which authenticated principals may `doas` to which users | breaks B1 if mis-scoped | +| `--cookie_secret_file` | empty *(documented: `be/src/rpc/authentication.cc` line 98)* | when unset, HS2-HTTP cookies fall back to per-process random — sessions do not survive cluster restarts but are not forgeable *(inferred — §14 Q15)* | shared cluster-wide secret for cookie HMAC | +| `--abort_on_config_error` | `true` *(inferred — §14 Q11)* | when off, security misconfigurations may not prevent startup | | + +**The insecure-default case.** A number of these flags ship in the "off, must +be turned on for production" posture. The maintainer ruling on whether the +*default* is a supported production posture is captured in §14 Q11; the +text of §3 item 3 and §10 assume the answer is **"dev/test default, +operator must flip per §10 for production"**. + +## §6 Assumptions about inputs + +### Per-endpoint trust table (network surfaces) + +| Surface / route | Parameter | Attacker-controllable? | Caller must enforce | +| --- | --- | --- | --- | +| HS2 binary `:21050`, HS2-HTTP `:28000`, Beeswax `:21000` | SQL text | **yes** | nothing — Impala parses, plans, and applies Ranger | +| HS2-HTTP `:28000` | `X-Forwarded-For` header | **yes** if `--trusted_domain_use_xff_header` is on; **never trust** otherwise *(inferred — §14 Q14)* | per §10, only enable behind a load balancer that strips and resets XFF | +| HS2-HTTP `:28000` | session cookie | signed with `--cookie_secret_file` HMAC; not attacker-forgeable when secret is unguessable *(inferred — §14 Q15)* | per §10, rotate the cookie-secret file if compromised | +| HS2-HTTP `:28000` | JWT / OAuth bearer | **yes**; signature checked when `--jwt_validate_signature=true` (default) *(documented: `be/src/rpc/authentication.cc`)* | per §10, leave signature checking on, set `--jwt_allow_without_tls=false` | +| HS2-HTTP `:28000` | `--trusted_auth_header` value | **yes**; treated as the authenticated identity | **never** expose the port directly to untrusted peers when this flag is set *(inferred — §14 Q14)* | +| Web UI `:25000`/`:25010`/`:25020` | `.htpasswd` credential | **yes** if `--webserver_password_file` is set | per §10, set the flag; per §10, set `--webserver_certificate_file` for HTTPS | +| Web UI `:25000` — query-profile and admin endpoints | profile ID / GET parameters | **yes** | Web UI auth is the only gate; sensitive query bytes appear unless log redaction is enabled | +| KRPC `:27000` and statestore/catalog ports `:23000`/`:24000`/`:26000` | Thrift / KRPC payload | **only by a peer that has cleared B3** | Kerberos + `internal_principals_whitelist` are the gate | +| Scanned table files (Parquet, ORC, Avro, text, Iceberg manifests) | file bytes | **yes** if an authorized writer can land bytes the reader will scan | Ranger separates writers from readers; B7 enforces who can land bytes | +| `ai_generate_text` LLM endpoint | LLM response | trusted only as far as the LLM is trusted | per §10, treat LLM output as untrusted text (do not pipe to executable contexts) *(inferred — §14 Q16)* | +| JDBC external table endpoint | rows returned by remote JDBC | trusted only as far as the remote endpoint is trusted | per §10, model JDBC external tables as data crossing a trust boundary | + +### Size / shape / rate + +- Impala accepts arbitrary-length SQL but the analyzer rejects queries above + implementation limits *(inferred — §14 Q17)*. +- Scanned files may be terabytes; row groups are streamed. Pathological + encodings (e.g. enormous string lengths in Parquet headers) are robustness + concerns *(inferred — §14 Q6)*. +- The HS2 / Beeswax surfaces have **no built-in rate limiting**; admission + control via the `--default_pool_max_requests` family of flags bounds + in-flight queries but not connection or auth-attempt rate *(inferred — + §14 Q18)*. + +## §7 Adversary model + +### Actors + +| Actor | In scope? | Capabilities granted | +| --- | --- | --- | +| Unauthenticated network peer reaching HS2 / Beeswax / HS2-HTTP | **yes** | TCP to the listening ports; may attempt authentication; may attempt to violate the protocol pre-auth | +| Unauthenticated peer reaching Web UI | **yes**, *if* the deployment exposes the Web UI publicly | as above for Web UI | +| Authenticated end user with limited Ranger privileges | **yes** | execute SQL, read tables the user has `SELECT` on, write tables the user has `INSERT` on | +| Authenticated end user with broad Ranger privileges | partial | only escapes from their Ranger envelope are in scope | +| Co-tenant on the same cluster | **yes** | same as authenticated end user; cross-tenant leakage is in scope | +| Authorized table writer producing data read by another user | **yes** for scanner robustness across the B7 boundary, but bounded — `VALID-HARDENING`, not `VALID`, unless memory corruption is reachable *(inferred — §14 Q6)* | +| Authenticated proxy front-end (Hue) using `doas` | **yes** only when `--authorized_proxy_user_config` is mis-scoped | +| Hostile peer impalad / statestored / catalogd | **out of scope** — see §3 item 2 | +| Hostile HMS / Ranger | **out of scope** — see §3 item 2 | +| Operator | **out of scope** — see §3 item 3 | +| Local process on the same host as `impalad` running as a different user | **partial** *(inferred — §14 Q19)*: same-host attackers with non-`impala` UID can read the Web UI / HS2 ports unless host firewalling forbids; Impala does not defend against same-host UID-0 attackers | +| Side-channel observer (cache timing, network timing) | **out of scope** *(inferred — §14 Q20)* | +| Quantum adversary | **out of scope** | + +### Authenticated-but-Byzantine peer (distributed-systems threshold) + +Impala is **not** a Byzantine-fault-tolerant system. A compromised +`impalad`/`catalogd`/`statestored` peer with a valid Kerberos identity can +cause unbounded damage (read any data the cluster can read, produce wrong +results, leak intermediate state). The cluster trusts its own membership +*(inferred — §14 Q21)*. → reports requiring a Byzantine internal peer are +`OUT-OF-MODEL: adversary-not-in-scope`. + +## §8 Security properties the project provides + +For each property: condition, violation symptom, severity tier, provenance. + +### P1 — Authentication of HS2 / Beeswax / HS2-HTTP clients + +- **Condition**: an authentication mode is enabled + (`--enable_ldap_auth`, `--principal`+`--keytab-file`, `--jwt_token_auth`, + `--oauth_token_auth`, or a SAML configuration). With none of these set, + Impala accepts unauthenticated SQL *(documented: `be/src/rpc/authentication.cc`)*. +- **Violation symptom**: a network peer holding no valid credential successfully + executes SQL. +- **Severity**: **security-critical**, `VALID` per §13. +- *(documented)* + +### P2 — Authorization of SQL operations via Apache Ranger + +- **Condition**: `--authorization_provider=ranger` is set and Ranger is + reachable *(documented: `docs/topics/impala_authorization.xml`)*. With this + flag unset, no authorization is enforced and all queries run as the + `impala` user *(documented: `docs/topics/impala_security.xml`, + `docs/topics/impala_authorization.xml`)*. +- **Violation symptom**: a query reads or modifies data not licensed by the + authenticated principal's Ranger policy. Failure mode includes both the + authorization-bypass case (Impala fails to apply a policy) and the + authorization-confusion case (Impala applies the wrong policy). +- **Severity**: **security-critical**, `VALID` per §13. +- *(documented)* + +### P3 — TLS confidentiality and integrity on the wire, when configured + +- **Condition**: `--ssl_server_certificate` + `--ssl_private_key` set on the + relevant daemon; minimum version per `--ssl_minimum_version` (default + `tlsv1.2` since Impala 4.0) *(documented: `docs/topics/impala_ssl.xml`)*. +- **Violation symptom**: cleartext on the wire after TLS is configured, or a + TLS handshake completing with a deprecated cipher despite + `--ssl_minimum_version=tlsv1.2`. +- **Severity**: **security-critical**, `VALID` per §13. +- *(documented)* + +### P4 — Kerberos authentication on internal RPCs in production + +- **Condition**: `--principal` and `--keytab-file` are set on all three + daemons *(documented: `docs/topics/impala_kerberos.xml`, + `docs/topics/impala_ldap.xml`)*. Without this, internal RPCs are + unauthenticated and a cleartext-internal-RPC report is *not* a §8 break — + the operator violated §10. +- **Violation symptom**: internal RPC completing successfully from a principal + not on `--internal_principals_whitelist`. +- **Severity**: **security-critical**, `VALID` per §13. +- *(documented)* + +### P5 — Bounded scope of authenticated impersonation (`doas`) + +- **Condition**: `--authorized_proxy_user_config` / + `--authorized_proxy_group_config` set; the authenticated front-end principal + appears as a key *(documented: `docs/topics/impala_delegation.xml`)*. +- **Violation symptom**: an authenticated principal successfully runs a query + as a delegated user not in their allow-list. +- **Severity**: **security-critical**, `VALID` per §13. +- *(documented)* + +### P6 — Log redaction, when configured + +- **Condition**: redaction rules configured per + `docs/topics/impala_logging.xml#redaction`. +- **Violation symptom**: literal values matching configured redaction patterns + appearing un-redacted in logs or Web UI query profiles. +- **Severity**: **security-critical** for data-protection-regulated deployments; + `VALID` per §13. +- *(documented)* + +### P7 — Web UI authentication, when configured + +- **Condition**: `--webserver_password_file` set; optionally Kerberos SPNEGO. +- **Violation symptom**: an unauthenticated peer accesses an authenticated + Web UI endpoint, or an authenticated peer accesses an endpoint above their + Web UI auth tier. +- **Severity**: **security-critical**, `VALID` per §13. +- *(documented: `docs/topics/impala_security_webui.xml`)* + +### P8 — Memory safety on well-formed inputs across documented surfaces + +- **Condition**: input matches the documented protocol (HS2 / Beeswax / Thrift + / KRPC / Parquet / ORC / Avro / Iceberg manifest / etc.); the host + conformant to §5; no `_hidden` debug flag is in use *(inferred — + §14 Q22)*. +- **Violation symptom**: heap or stack corruption, out-of-bounds read/write, + use-after-free, double-free reachable from a §6 input. +- **Severity**: **security-critical** when reachable from network input or from + table data crossing B7; **`VALID-HARDENING`** when reachable only by a writer + who already controls the bytes (§3 item 6). +- *(inferred — §14 Q22)* + +### P9 — No SQL injection from end-user-supplied parameters into back-end queries against other systems + +- **Condition**: applies only to flows where Impala emits SQL to a remote + system on behalf of an Impala client (JDBC external tables; `ai_generate_text` + with prompt-as-SQL patterns). +- **Violation symptom**: end-user SQL text appearing un-escaped in a remote- + query string. +- **Severity**: case-dependent; `VALID-HARDENING` if the remote system is also + Impala-trusted, `VALID` if the remote system is a tenant boundary. +- *(inferred — §14 Q23)* + +## §9 Security properties the project does *not* provide + +State each plainly so a triager can route an inbound report to the matching +disclaimer. + +- **No isolation between authenticated user SQL and the `impalad` process.** A + user with Ranger privilege to `CREATE FUNCTION` and to `SELECT` from the + resulting function can run arbitrary native or JVM code inside `impalad`. UDFs + are **not** sandboxed. See §3 item 5 *(inferred — §14 Q5)*. +- **No defense against decompression / decoding bombs in scanned files.** A + malicious or buggy table writer can land Parquet / ORC / Avro / text files + designed to maximize CPU and memory; the reader has no built-in cap on + per-file resource use *(inferred — §14 Q6)*. +- **No quotas on per-query or per-user resource consumption beyond what + admission control provides.** A user with `SELECT` on a large table can + cause arbitrary wall-clock and memory burn. Operator must configure + `--default_pool_*` admission-control flags *(inferred — §14 Q18)*. +- **No defense against intra-cluster Byzantine failure.** A compromised peer + with a valid Kerberos identity can read any data the cluster can read; see + §7 *(inferred — §14 Q21)*. +- **No protection against the operator.** Anyone with the keytab, the + cookie-secret file, the `.htpasswd`, the impala Unix account, or root on + any impala host wins. See §3 item 3. +- **No protection against a malicious HMS / Ranger.** See §3 item 2. +- **No data-at-rest encryption.** Impala writes file bytes through the storage + layer's existing protections (HDFS Transparent Data Encryption, S3 SSE, + etc.). Impala does not encrypt at the table format level *(inferred — + §14 Q24)*. +- **No defense against side-channel observation** (cache, timing, branch + prediction) of query plans or data *(inferred — §14 Q20)*. +- **No constant-time comparison of authentication secrets** beyond what the + underlying SASL/Kerberos libraries provide *(inferred — §14 Q25)*. +- **No defender stance against an attacker on the same Linux host running as + a non-`impala` UID** — Impala defends only across the network surface; + same-host attackers with shell access on the impala host already have many + paths to win *(inferred — §14 Q19)*. + +### False-friend properties (call out separately) + +- **`SHOW TABLES` / `SHOW DATABASES` filtering is an authorization view, not + an information-flow channel.** Object names a user is not authorized to + see are hidden, but error messages, query-profile timing, and Web UI traces + may reveal existence indirectly *(inferred — §14 Q26)*. +- **Log redaction is a *display* feature, not a confidentiality boundary.** It + obfuscates literals in *new* log entries when patterns match; it cannot + retroactively cleanse leaked log files, and a regex miss leaks the literal. +- **Kerberos authenticates the *principal*, not the *host* the principal + connects from.** A stolen keytab is a stolen identity. +- **TLS encrypts but does not authenticate the application-layer identity.** + Authentication is layered on (LDAP, JWT, etc.); TLS by itself does not + authorize. +- **`.htpasswd` Web-UI authentication does not provide per-user authorization + on the Web UI.** Any authenticated `.htpasswd` user sees all Web UI + contents, including query bytes and profiles *(inferred — §14 Q8)*. +- **`--trusted_domain` / `--trusted_auth_header` is an explicit bypass of + client authentication.** Setting it without controlling the load balancer + hands an attacker the keys. +- **Ranger column-masking and row-filter policies operate at the planner + level, not the storage level.** Anyone bypassing the planner (a hostile + peer reading the file directly via HDFS, a UDF reading raw bytes) is not + constrained by them. + +### Well-known attack classes the project does not defend against + +- **SQL-engine-amplified DoS** ("malicious analytic query"): a user with + `SELECT` privilege issuing a Cartesian product across petabyte tables. The + fix surface is admission control, not the engine. +- **Decompression / decoding bombs** in supported file formats (see above). +- **Adversarial table-writer collusion**: a writer landing files that crash + a downstream reader is `VALID-HARDENING` at most, because the writer could + simply have written wrong data. +- **Confused-deputy via `doas`** when the proxy list is mis-scoped. +- **Time-of-check-to-time-of-use** between Ranger policy fetch and query + execution: policy changes mid-query are not retroactively enforced + *(inferred — §14 Q27)*. + +## §10 Downstream responsibilities + +The operator deploying Impala in production **must**: + +1. Set `--principal` + `--keytab-file` on `impalad`, `statestored`, + `catalogd`. Without these, internal RPC has no authentication + *(documented: `docs/topics/impala_kerberos.xml`, + `docs/topics/impala_ldap.xml`)*. +2. Set `--authorization_provider=ranger` and configure Ranger. Without this, + no authorization is enforced *(documented: `docs/topics/impala_authorization.xml`)*. +3. Enable TLS — `--ssl_server_certificate` and `--ssl_private_key` on all + daemons, and `--ssl_client_ca_certificate` to authenticate the peer of + internal RPC *(documented: `docs/topics/impala_ssl.xml`)*. +4. Set `--webserver_password_file` and `--webserver_certificate_file` so the + Web UI is authenticated and TLS-served *(documented: + `docs/topics/impala_security_webui.xml`, + `docs/topics/impala_security_guidelines.xml`)*. +5. Restrict Web UI ports (`:25000`/`:25010`/`:25020`) at the network layer + to a trusted operator subnet *(documented: + `docs/topics/impala_security_guidelines.xml`)*. +6. Restrict membership in `--internal_principals_whitelist` to the actual + Kerberos principals of cluster members *(documented: `be/src/rpc/authentication.cc`)*. +7. **Never** set `--jwt_allow_without_tls=true`, + `--oauth_allow_without_tls=true`, or `--saml2_allow_without_tls_debug_only=true` + in production *(inferred — §14 Q13)*. +8. **Never** set `--trusted_domain` / `--trusted_auth_header` / + `--trusted_domain_use_xff_header` unless the listening port is exposed + only to a load balancer that strips and resets the relevant header + *(inferred — §14 Q14)*. +9. Set `--cookie_secret_file` to a long, random, cluster-wide secret with + filesystem permissions restricted to the `impala` user *(documented: + `be/src/rpc/authentication.cc`)*. +10. Set `--authorized_proxy_user_config` / `--authorized_proxy_group_config` + to the smallest set of front-end principals that need `doas`, with the + smallest set of impersonated users *(documented: + `docs/topics/impala_delegation.xml`)*. +11. Configure log redaction patterns for any sensitive literal that may + appear in WHERE-clause queries *(documented: + `docs/topics/impala_logging.xml#redaction`)*. +12. Secure the OS-level `impala` Unix user and the `root` / `sudoers` set on + every Impala host *(documented: + `docs/topics/impala_security_guidelines.xml`)*. +13. Configure admission control (`--default_pool_max_requests`, + `--default_pool_max_queued`, `--default_pool_mem_limit`) to bound + per-query and per-pool resource use; Impala does not enforce DoS + protection by itself *(inferred — §14 Q18)*. +14. Treat `ai_generate_text` results and JDBC external-table rows as + crossing a trust boundary; do not assume the remote system is honest + *(inferred — §14 Q16)*. +15. Secure the underlying storage (HDFS, S3, ADLS, Ozone) with native ACLs; + Impala enforces only what it can see *(documented: + `docs/topics/impala_security_files.xml`)*. + +## §11 Known misuse patterns + +- **Exposing port 25000 (Web UI) directly to the public Internet without + `--webserver_password_file`.** Anyone reaching the port reads in-flight + query bytes, server flags, table names. → operator hardening per §10. +- **Running Impala with `--authorization_provider` unset in a multi-tenant + cluster.** All queries succeed as the `impala` Unix user — there is no + authorization at all *(documented: + `docs/topics/impala_authorization.xml`)*. +- **Setting `--trusted_domain` without ensuring the listening port is only + reachable from the trusted reverse proxy.** The flag is a deliberate + client-auth bypass for proxy deployments; the operator owns the network + fence. +- **Using `CREATE FUNCTION` to load a UDF binary supplied by an end user.** + UDFs run in-process. → Ranger-gate `CREATE FUNCTION` to administrators. +- **Treating Impala's `SHOW TABLES` view as a confidentiality boundary.** + Existence of a hidden object may leak through error messages or query + profiles *(inferred — §14 Q26)*. +- **Re-using `--cookie_secret_file` across clusters of different trust + levels.** A leak in cluster A becomes a forgery primitive in cluster B + *(inferred — §14 Q15)*. +- **Disabling TLS internally between `impalad`/`statestored`/`catalogd` in + production.** Cleartext internal RPC + Kerberos `auth-int` is the documented + minimum; many deployments leave it at `auth` *(inferred — §14 Q28)*. +- **Mixing authenticated and unauthenticated coordinator daemons in the same + cluster.** Impala 2.0+ accepts both Kerberos and LDAP on the same port; an + operator who *also* leaves a single coordinator unauthenticated produces a + bypass *(documented: `docs/topics/impala_mixed_security.xml`)*. + +## §11a Known non-findings (recurring false positives) + +This section is the highest-leverage input for automated agentic security +scans. Each entry: tool symptom, why it is safe under the model, the § +that licenses the call. + +- **"Internal RPC accepts plaintext / no auth" report against `:23000`, + `:24000`, `:26000`, `:27000`.** In a model-conforming deployment the + operator has set `--principal` + Kerberos per §10; cleartext is a §10 + violation by the operator, not an Impala bug. → `OUT-OF-MODEL: + non-default-build` per §5a. +- **"Web UI on `:25000` reachable without authentication" against an + un-`.htpasswd`-protected cluster.** Same shape as above; operator + responsibility per §10. → `OUT-OF-MODEL: non-default-build`. +- **"`--jwt_allow_without_tls=true` permits credentials over plaintext" in a + config file.** The flag is `_hidden` and named "debug only"; setting it + voids §8 P3. → `OUT-OF-MODEL: non-default-build` *(inferred — §14 Q13)*. +- **"Path traversal in `gzopen`-style filename" against scanners.** All + scanner paths are Ranger-checked URIs, not OS paths; the URI namespace is + rooted at the operator-configured warehouse. → `OUT-OF-MODEL: + trusted-input` *(inferred — §14 Q29)*. +- **"Hardcoded test password / keytab in `tests/`, `testdata/`, + `ssh_keys/`."** `tests/`, `testdata/`, `ssh_keys/` are unsupported + components. → `OUT-OF-MODEL: unsupported-component`. +- **"SQL injection via end-user SQL text."** End-user SQL **is the input**; + the engine is designed to interpret it. The Ranger envelope is the + authorization boundary, not SQL-text sanitization. → `BY-DESIGN: + property-disclaimed`. +- **"User-defined function executes arbitrary code in the impalad process."** + Documented and intentional; admission is Ranger-gated. → `BY-DESIGN: + property-disclaimed` per §9. +- **"DoS via expensive analytic query on a large table."** Admission control + is the fix surface, not the engine. → `BY-DESIGN: property-disclaimed` + per §9. +- **"Decompression bomb in a Parquet/ORC file landed in a warehouse table."** + Writers must have `INSERT`; the harm is reachable from an already- + privileged actor. → `VALID-HARDENING` at most, unless it reaches §8 P8 + memory safety. +- **"Unchecked return from `malloc` in `udf_samples/`."** `udf_samples/` is + unsupported sample code. → `OUT-OF-MODEL: unsupported-component`. +- **"Vendored Apache Kudu code under `be/src/kudu/security/` has CVE-X."** + Report upstream to Apache Kudu; Impala will pick up the fix on the next + vendored sync. → `OUT-OF-MODEL: unsupported-component` (upstream pointer) + *(inferred — §14 Q3)*. + +## §12 Conditions that would change this model + +Revise this document when any of the following lands: + +- A new authentication mechanism on a client-facing surface (e.g. mTLS-as- + auth on HS2-HTTP, OIDC, U2F). +- A new authorization provider beyond Ranger (e.g. native Impala policy + store, OPA integration). +- A new data-at-rest encryption story at the Impala layer (currently + delegated; see §9). +- A new external-data surface (a new JDBC external-table connector, a new + REST catalog beyond Iceberg, a new LLM connector beyond + `ai_generate_text`). +- A UDF sandboxing story (changes §9 and §3 item 5). +- A change in the default value of any §5a flag, especially flags + controlling auth (`--ssl_minimum_version`, `--jwt_validate_signature`). +- A vulnerability report that cannot be cleanly routed to one of the §13 + dispositions: that is evidence the model is incomplete. + +## §13 Triage dispositions + +A report against Impala receives exactly one of the following: + +| Disposition | Meaning | Licensed by | +| --- | --- | --- | +| `VALID` | Violates a §8 property via an in-scope §7 adversary using an in-scope §6 input. | §8, §6, §7 | +| `VALID-HARDENING` | No §8 property violated, but a §11 misuse pattern can be made harder to fall into by code change. Fixed at maintainer discretion, typically no CVE. | §11 | +| `OUT-OF-MODEL: trusted-input` | Requires attacker control of a §6 parameter the model marks trusted (e.g. HMS-supplied metadata, Ranger-supplied policy, operator-supplied config flag). | §6 | +| `OUT-OF-MODEL: adversary-not-in-scope` | Requires a §7 actor the model excludes (operator, malicious HMS/Ranger, Byzantine peer, side-channel observer, same-host non-`impala` UID-0). | §7 | +| `OUT-OF-MODEL: unsupported-component` | Lands in `tests/`, `testdata/`, `infra/`, `ssh_keys/`, `udf_samples/`, vendored upstream code under `be/src/kudu/`, etc. | §3 item 7, §3 item 8 | +| `OUT-OF-MODEL: non-default-build` | Only manifests under a §5a flag the maintainer has ruled is dev/test (e.g. `--jwt_allow_without_tls=true`, no `--principal`). | §5a | +| `OUT-OF-MODEL: equivalent-harm` | An actor already-authorized under the model can cause the same harm via a documented path (writer landing arbitrary file bytes, SQL-privileged user submitting expensive queries). | §3 item 4, §3 item 6 | +| `BY-DESIGN: property-disclaimed` | Concerns a §9 property the project explicitly does not provide (UDF sandboxing, DoS protection, side channels, etc.). | §9 | +| `KNOWN-NON-FINDING` | Matches a §11a recurring false positive. | §11a | +| `MODEL-GAP` | Cannot be cleanly routed to any of the above — triggers §12 model revision. | §12 | + +## §14 Open questions for the maintainers + +Every *(inferred)* tag in the body maps to one of these. Proposed answers are +inline; please confirm, correct, or strike. + +### Wave 1 — scope, intended use, insecure defaults + +**Q1.** The model assumes Impala is "a cluster of cooperating processes +deployed inside an operator-controlled network perimeter" and is *not* a +single-host library. Confirm? Will this answer change with the upcoming +Iceberg-REST-only `impalad` mode? *(maps to §2)* + +**Q2.** Are the Hive Metastore, Ranger Admin, and other catalog/policy +control-plane services modeled as trusted (we propose **yes**), or in-scope +adversaries? If trusted, that licenses §3 item 2 and §11a's +trusted-input dispositions. *(maps to §2, §3, §11a)* + +**Q3.** Code vendored from Apache Kudu under `be/src/kudu/security/`, +`be/src/kudu/util/`, etc. — is the policy "report upstream to Apache Kudu; +we pick up fixes via vendored sync" (proposed)? Same question for Apache +Hive code under `fe/src/main/java/org/apache/impala/authentication/saml/Hive*.java`. *(maps to §3 item 8)* + +**Q4.** Is "an authorized user with Ranger privilege X causes harm Y that +the same Ranger privilege already permits through a documented path" out of +scope (proposed: **yes**, `OUT-OF-MODEL: equivalent-harm`)? *(maps to §3 item 4)* + +**Q5.** UDF sandboxing — confirmed that there is none, and that this is a +deliberate `BY-DESIGN` disclaim (proposed)? *(maps to §3 item 5, §9)* + +**Q6.** What is the cut-line on malformed-input behavior in scanners +(Parquet/ORC/Avro/Iceberg)? Proposed: memory corruption is `VALID`; crash / +exception / slow path / OOM on a malformed file *landed by a writer with +INSERT* is `OUT-OF-MODEL: equivalent-harm`; the same on a writer-controlled +file in a *read-only-to-writer* deployment is `VALID-HARDENING`. *(maps to +§3 item 6, §8 P8, §9, §11a)* + +**Q7.** Confirm the unsupported-component list: `tests/`, `testdata/`, +`infra/`, `docker/`, `package/`, `ssh_keys/`, `cmake_modules/`, +`experiments/`, `udf_samples/`. Anything to add or remove? *(maps to §3 item 7)* + +### Wave 2 — Web UI, internal RPC, insecure defaults + +**Q8.** Is the Web UI a per-user authentication surface or a flat-admin +surface? Proposed answer: flat-admin (any `.htpasswd` user sees all query +bytes); flag if per-user authorization on Web UI is intended. *(maps to +§4 B2, §9 false-friend)* + +**Q9.** Clock-skew assumption for Kerberos: do you make any Impala-side +claim about tolerance, or is that entirely the operator's responsibility +(proposed)? *(maps to §5)* + +**Q10.** Confirm the "what Impala does not do to its host" inventory +in §5: no child processes besides codegen; no signal handlers besides +breakpad; no `LD_*` consumption for security-sensitive decisions. Any +exceptions? *(maps to §5)* + +**Q11.** The big "insecure defaults" question, per §5a "insecure-default +case" rule. Which of these is the supported production posture (a +`VALID` report when violated), and which is dev/test (a +`OUT-OF-MODEL: non-default-build`)? + +- `--principal`/`--keytab-file` unset +- `--authorization_provider` unset (no Ranger) +- `--ssl_server_certificate`/`--ssl_private_key` unset +- `--webserver_password_file` unset +- `--webserver_certificate_file` unset +- `--enable_ldap_auth=false` (proposed: this one is fine — Kerberos may be the chosen auth) + +Proposed across the board: **dev/test, operator must flip per §10**, except +the last. *(maps to §5a, §10, §13)* + +**Q12.** Web UI specifically: the docs strongly recommend +`--webserver_password_file`. Is a Web UI port reachable without auth a +`VALID` report on the operator's behalf, or `OUT-OF-MODEL: +non-default-build`? *(maps to §5a, §11a)* + +**Q13.** The `_hidden` debug-only flags: `--jwt_allow_without_tls`, +`--oauth_allow_without_tls`, `--saml2_allow_without_tls_debug_only`. Confirm +that setting these is `OUT-OF-MODEL: non-default-build` and §10 mandates +"never in production"? *(maps to §5a, §10, §11a)* + +**Q14.** `--trusted_domain` + `--trusted_domain_use_xff_header` + +`--trusted_auth_header`. These are explicit auth bypasses for proxy +deployments. Confirm that operator-supplied misuse is `OUT-OF-MODEL: +trusted-input` *unless* Impala's parser of these headers itself has a +defect? *(maps to §5a, §10)* + +**Q15.** `--cookie_secret_file`: when unset, do HS2-HTTP cookies fall back +to a per-process random secret (proposed)? Is the cookie HMAC algorithm +documented and considered a §8 property? *(maps to §5a, §6, §10)* + +### Wave 3 — externally-facing surfaces, distributed model + +**Q16.** Treat `ai_generate_text` responses and JDBC-external-table rows as +data crossing a trust boundary (proposed)? Does Impala itself attempt any +sanitization of LLM responses or external-JDBC rows? *(maps to §6, §10)* + +**Q17.** SQL-text size limit: is there an enforced maximum (proposed: yes, +in the analyzer) or is "submit a 1 GB query" an unlimited-input concern? +*(maps to §6, §9)* + +**Q18.** Per-query / per-user DoS protection: is admission control (the +`--default_pool_*` family) the entire enforcement, with no engine-level +guard? Should this be in §9 as a flat disclaim, or in §8 with a +conditional guarantee? *(maps to §8, §9, §10)* + +**Q19.** Same-host non-`impala` UID: do we make any defense claim there, or +is "shell access on an impala host" effectively game-over (proposed: +game-over)? *(maps to §7, §9)* + +**Q20.** Side-channel observers (cache timing, branch prediction): out of +scope (proposed)? *(maps to §7, §9)* + +**Q21.** Byzantine-internal-peer threshold: confirm Impala makes no BFT +claim, so any compromised peer with a valid Kerberos identity is unbounded +(proposed)? *(maps to §7, §9)* + +**Q22.** §8 P8 (memory safety) — the property is in the model on +inference. Is the reachability boundary correctly "in-model for network and +B7-writer-landed bytes; `VALID-HARDENING` for B7-writer-controlled bytes in +read-only-to-writer deployments; out-of-model for `_hidden` debug +configurations"? *(maps to §8 P8, §9)* + +**Q23.** §8 P9 (no SQL injection into back-end queries) — applies only to +JDBC external tables and `ai_generate_text` (proposed). Confirm whether +remote-query construction escapes end-user SQL parameters? *(maps to §8 P9)* + +### Wave 4 — false-friends and edge cases + +**Q24.** Is data-at-rest encryption an in-scope responsibility (proposed: +**no** — delegated to storage layer)? *(maps to §9)* + +**Q25.** Constant-time comparison of authentication secrets — is anything +relevant here a §8 property, or all delegated to SASL/Kerberos/JWT +libraries? *(maps to §9)* + +**Q26.** Is `SHOW TABLES` filtering a confidentiality boundary (proposed: +**no**, an authorization-view feature with known existence-leak side +channels through error messages and query profiles)? *(maps to §9 +false-friend, §11)* + +**Q27.** Time-of-check-to-time-of-use between Ranger policy fetch and query +execution: is mid-query policy revocation enforced (proposed: **no**, the +plan is finalized at start)? *(maps to §9)* + +**Q28.** Internal RPC integrity vs confidentiality: is the documented +production minimum Kerberos `auth-int` (integrity, no encryption) or +`auth-conf` (encrypted)? Many operators leave it at `auth`. Is plaintext- +but-Kerberos-authed internal RPC in or out of model? *(maps to §11)* + +**Q29.** Scanner path-traversal reports: confirm that all scanner paths are +URIs resolved against an operator-configured warehouse root, not arbitrary +OS paths, so traversal is a Ranger / configuration concern rather than a +filesystem concern? *(maps to §11a)* + +### Wave 5 — meta + +**Q30.** Should this document live at `docs/threat-model.md` (proposed, +under the existing docs tree) or as part of the DITA topics under +`docs/topics/`? *(meta)* + +**Q31.** Is there an existing Impala threat-model document (Confluence, +internal) that this should reconcile against rather than supersede? +*(meta — §3.1a of the rubric)* + +**Q32.** What kind of change to Impala should trigger a revision (proposed +list in §12 — confirm or correct)? *(meta, §12)* + +**Q33.** §11a known-non-findings is thin in this draft (~11 patterns, +all doc-reasoned). Could the PMC populate §11a from recurring patterns in +your IMPALA-* JIRA closures — specifically the "we ruled this out as not +a vulnerability" category? Public Impala security artefacts (website, +`announce@apache.org`, `[SECURITY]`-tagged threads on impala lists) carry +essentially no project-direct CVE history to mine from, so this §11a is +the highest-leverage section the scan agent uses for suppression and is +currently the weakest by content. Concrete asks: 3–5 patterns the PMC sees +recur in inbound reports (e.g. "exception-not-vuln from a malformed query +plan", "OOM from a too-large IN list", etc.). *(meta — §11a)* + +--- + +## Appendix: SECURITY.md → §x back-map + +Impala does not currently ship an in-repo `SECURITY.md`, and the website +does not publish a project-level security policy page +(`https://impala.apache.org/security.html` returns 404 at draft time; the +landing page only links the generic ASF security URL). The de facto +SECURITY-policy artifacts are the in-repo DITA docs under +`docs/topics/impala_security*.xml`, which are the source for the published +documentation at `https://impala.apache.org/docs/build/html/topics/impala_security.html`. +The back-map below covers the in-repo source. + +| Source | Claim | Lands in | +| --- | --- | --- | +| `docs/topics/impala_security.xml` | "Impala includes a fine-grained authorization framework … based on Apache Ranger" | §8 P2 | +| `docs/topics/impala_security.xml` | "Impala relies on the Kerberos subsystem for authentication" | §8 P1, §8 P4 | +| `docs/topics/impala_security.xml` | "auditing capability … Impala generates the audit data which can be consumed … by cluster-management components focused on governance" | §10 item 11 (operator picks the audit sink) | +| `docs/topics/impala_security_guidelines.xml` | "Secure the root account", restrict `sudoers` | §3 item 3, §10 item 12 | +| `docs/topics/impala_security_guidelines.xml` | "Ensure that the Impala web UI … is password-protected" | §10 item 4, §11 first bullet | +| `docs/topics/impala_security_files.xml` | "All Impala read and write operations are performed under the filesystem privileges of the impala user" | §4 B7, §10 item 15 | +| `docs/topics/impala_authentication.xml` | "Impala supports authentication using either Kerberos or LDAP. You can also make proxy connections through Apache Knox." | §8 P1, §11 (Knox proxy is `--trusted_domain`-shaped) | +| `docs/topics/impala_authorization.xml` | "By default … Impala does all read and write operations with the privileges of the impala user" | §5a default-row "authorization_provider unset", §8 P2 violation symptom | +| `docs/topics/impala_ldap.xml` | "You must use the Kerberos authentication mechanism for connections between internal Impala components" | §4 B3, §8 P4 | +| `docs/topics/impala_ssl.xml` | "Impala supports TLS/SSL network encryption … default version was changed from 'tlsv1' to 'tlsv1.2' starting in Impala 4.0" | §8 P3, §5a row | +| `docs/topics/impala_security_webui.xml` | "This file should only be readable by the Impala process and machine administrators" | §10 item 9 | +| `docs/topics/impala_mixed_security.xml` | "Impala 2.0 and later automatically handles both Kerberos and LDAP authentication" | §11 (mixed-mode misconfig) | +| `docs/topics/impala_delegation.xml` | "Impala supports delegation where users whose names you specify can delegate the execution of a query to another user" | §8 P5, §10 item 10 | +| `docs/topics/impala_logging.xml#redaction` | "log redaction is a security feature that prevents sensitive information from being displayed in locations used by administrators for monitoring and troubleshooting" | §8 P6, §9 false-friend | +| `docs/topics/impala_ports.xml` | port inventory | §2 component table, §4 boundary table | +| `EXPORT_CONTROL.md` | "This software uses OpenSSL to enable TLS-encrypted connections, generate keys for asymmetric cryptography, and generate and verify signatures" | §5 cryptography assumption | +| `be/src/rpc/authentication.cc` lines 98–283 | flag inventory (cookie_secret_file, jwt_*, oauth_*, trusted_*, internal_principals_whitelist) | §5a, §6, §10 |