Managed domains#4
Conversation
use_ravion_managed_domains toggle: ravion_certificate (shared_wildcard) issues *.<name>-<hash>.<apex>; this module owns the public ALB HTTPS listener with that cert as default (alb submodule skips its HTTPS listener + cert ARNs); opens SG 443. Outputs the wildcard fqdn + listener arn + cert arn + aws account/region for ecs_service to nest under. Provider pinned ravion.com/ravion/ravion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cluster_parent_fqdn enables Ravion domains. Mode A (no domains): ravion_domain auto-FQDN <name>.<cluster-apex> rides the cluster wildcard via a listener rule. Mode B (domains): per-service ravion_certificate (<=10 SANs) attached to the cluster listener + ravion_domain custom routing records; auto-FQDN retires once customs are healthy (ravion_auto_domain_status). Skips caller listener rules in Ravion mode. Outputs auto fqdn/url + custom cert arn. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
use_ravion_managed_domains: ravion_certificate (instance, target_arn = the CloudFront distribution ARN, region us-east-1) covering custom domains or a generated auto-FQDN; ravion_domain custom routing records (ALIAS to the distribution, CloudFront zone Z2FDTNDATAQYW2). Configure var.distributions without aliases/cert in this mode (Ravion sets them server-side). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cluster only wired the Ravion wildcard cert + 443 HTTPS listener on the public ALB, so private services (web-private / private-network-server) had no Ravion-owned listener to attach to. Mirror the public wiring onto the private ALB — one wildcard cert backs both listeners (an ACM ARN can default many). ecs_cluster: - enable_ravion_domain now triggers on public OR private ALB (was public-only); precondition requires at least one ALB instead of mandating the public one. - Add aws_lb_listener.ravion_https_private on the private ALB (same cluster cert) + a private-ALB 443 ingress rule; private alb submodule now skips its own HTTPS listener/cert in Ravion mode, same as the public submodule. - public/private_alb_https_listener_arn outputs surface the Ravion-owned listener when present (length()-guarded so a private-only cluster is valid). ecs_service: - Generalize cluster_https_listener_arn / cluster_alb_dns_name / cluster_alb_zone_id descriptions: pipe the public OR private ALB outputs per the service's visibility. The resources were already visibility-agnostic; only the docs hardcoded "public". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…iven id The cluster wildcard FQDN used var.name (the project-environment slug, e.g. testttsss-prod-modules), not the user-facing instance given id (elysia-ecs-cluster). Declare module_instance_given_id (injected by the runner as TF_VAR_module_instance_given_id) and default the cert leaf to it, so the wildcard becomes <given-id>-<hash>.<ravion-apex>. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Match the ecs_cluster fix: the service auto-domain (<leaf>.<cluster-wildcard>) used var.name (project-env slug) instead of the user-facing instance given id. Declare module_instance_given_id (runner-injected) and default the leaf to it, so the auto-FQDN becomes <given-id>.<cluster-apex>. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ravion_certificate.cluster now passes target_dns_name/target_zone_id (public ALB if present, else private) so Ravion publishes a *.<apex> ALIAS to the cluster ALB. Service auto-FQDNs riding the wildcard then resolve with no per-service DNS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ollide Fixed name + create_before_destroy = true means any attribute change that forces TG replacement (e.g. container_port change) fails apply because the new TG can't share the same name as the existing one. Switching to name_prefix lets AWS allocate a unique suffix on each create. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Existing 'spa' and 'filesystem' modes append /index.html to extensionless paths, which is wrong for object sites whose URLs ARE the S3 keys — namely a Terraform provider registry, where the viewer requests `/v1/providers/<ns>/<type>/versions` and must get back the literal JSON file at that key, not `/<...>/versions/index.html`. `raw` is a 1:1 viewer-URI → /<version>/<URI> mapping. KVS-driven versioning still applies. Use for terraform registries, S3-like content APIs, or any case where viewer URLs must equal S3 keys 1:1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move all required_providers source references from `providers.siddharthsuresh.dev/ravion/ravion` (cloudflared → local registry on a laptop) to `provider-cf.siddharthsuresh.dev/ravion/ravion` (CloudFront → S3, multi-version, KMS-signed). The local-cloudflared path stays alive during the cutover so existing stacks pinning the old hostname keep working until they're migrated or destroyed. Migration note: existing stacks have provider addresses recorded in their cloud-backend state under the old hostname. A subsequent apply will need a state-replace-provider pass, OR be done as part of a destroy+recreate. New stacks pick up the new hostname automatically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The runner's terraformrc network-mirror config is hardcoded for the canonical hostname (providers.siddharthsuresh.dev). Pointing modules at provider-cf.siddharthsuresh.dev triggers terraform's "requires authentication credentials" error since the runner doesn't emit a credentials/mirror block for that host. The proper migration is stage B — keep the source unchanged and point the canonical hostname's DNS at CloudFront — not a per-module source rewrite. Reverting `2f78c63`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…esh.dev" This reverts commit 638ab14.
This reverts commit 291cac2.
…ains state ecs_cluster always owns the public/private HTTPS listener at a stable TF address so toggling use_ravion_managed_domains is an in-place certificate swap, not a destroy+create across two addresses. Only the default cert (Ravion wildcard vs the customer's first ARN), the SNI cert set, and ravion_certificate.cluster change on toggle. - alb submodule: additive force_http_to_https_redirect keeps the HTTP->HTTPS redirect when a parent owns port 443; redirect deduped into locals - new ravion_managed_domains_enabled output for service-level show/hide - moved blocks for the in-root renames + submodule->root migration - focused tests/listeners.tftest.hcl (8/8 pass) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sification Each domains entry is classified per-entry instead of all-or-nothing: <leaf>.<apex> (one label under the cluster apex) rides the cluster wildcard cert via SNI (no per-service cert, no DNS record); everything else gets one per-service instance cert + a customer routing record. Empty list falls back to the auto-FQDN <given-id>.<apex>. Removes the ravion_auto_domain_status retirement flow — the domains list is the single source of truth. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lowercase + strip trailing dot/whitespace + drop empties, and require a non-empty leaf for the wildcard bucket, so mixed-case / trailing-dot / empty-leaf entries classify correctly and never yield an invalid ALB host header. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Domains under the cluster apex that aren't a single-label <leaf>.<apex> (the bare apex, or names more than one label deep) can't ride the *.<apex> wildcard cert and can't be satisfied with a customer record (the record would live in the Ravion-managed zone). Add an invalid_apex_domains local + a lifecycle.precondition on ravion_certificate.svc that fails the plan with the offending entries and the fix, instead of silently mis-routing them into a per-service cert + an unwritable routing record. Pairs with the server-side RejectCustomDomainUnderApex backstop for direct-API callers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- B3: mock the ravion provider in the pre-existing ecs_cluster + ecs_service basic.tftest.hcl so the suites stop aborting on "Missing Ravion API key" (declaring the provider configures it even when all ravion resources are count=0 on the BYO path). - M13: split the ravion listener rule's host headers into chunks of <=5 values (AWS ALB's per-rule condition-value quota), each chunk with its own priority. - M14: keep the rolling target group's pre-branch stable name substr(var.name,0,28)+"-tg" instead of name_prefix — avoids the one-time ForceNew that deadlocks against the listener rule's ignore_changes=[action], and stays within ALB's 32-char TG-name limit. - #39: gate the ravion listener rule (and cert/domain) on enable_load_balancer so it can't be created with a null target_group_arn. - #40: widen the auto-derived rule-priority hash entropy to cut collisions on the shared cluster listener. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…in-place rotation Rotating the shared_wildcard cert (any RequiresReplace change, e.g. a renamed apex) destroyed the old cert before swapping the listener, hitting ACM ResourceInUse and deadlocking. create_before_destroy issues the new cert and swaps it onto the listener in-place before deleting the old one. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ision_check A second cluster claiming an apex another cluster already owns silently hijacked the *.<apex> routing ALIAS. Add a plan-time precondition backed by the ravion_dns_collision_check data source (the backend resolves the name leaf to *.<name>.<apex> and reports a collision only when a DIFFERENT module instance owns it, so a self re-apply passes). The allocator enforces the same rule server-side as an apply-time backstop. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ider >= 1.0.0 Add a ravion_parent_apex_check data source + a precondition on the service's nested wildcard domains: the plan fails if cluster_parent_fqdn points at an apex the service isn't entitled to (e.g. another cluster's apex). The control plane enforces the same rule at apply (Dns:PARENT_APEX_UNAUTHORIZED), so this only surfaces the failure earlier — users never write the guard. The empty-domains auto-FQDN fallback (effective_domains) is unchanged: a service with no custom domains still gets <given-id>.<apex> and stays accessible. Bump the ravion provider pin to >= 1.0.0 across ecs_service, ecs_cluster, and static_site (the build that ships the parent-apex data source + the server-side apex guards). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ependents Wire the second new data source: read the live service domains nested under the cluster wildcard apex and expose them as the ravion_cluster_dependent_domains output for the UI / safe-teardown orchestration. Deliberately an output, not a precondition: a cluster legitimately has dependents during normal operation and Terraform can't scope a precondition to destroy-time, so a dependent_count == 0 gate would block every apply. The control plane already refuses a wildcard-cert teardown while dependents exist (Dns:CERT_APEX_IN_USE). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… rule The control plane now authorizes parent-apex nesting from a signed token claim (the clusters the run references), not same-environment. Update the guard comment + precondition error message to say "references that cluster" instead of "in this environment." Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| moved { | ||
| from = aws_vpc_security_group_ingress_rule.ravion_https_private_ipv4 | ||
| to = aws_vpc_security_group_ingress_rule.private_https_ipv4 | ||
| } |
There was a problem hiding this comment.
The private IPv6 SG rule rename is missing a
moved block. Three of the four symmetrical SG rule renames have moved blocks (ravion_https_ipv4, ravion_https_ipv6, ravion_https_private_ipv4), but ravion_https_private_ipv6 → private_https_ipv6 is absent. For any cluster already in Ravion mode with a private ALB, Terraform will destroy the old rule and create a new one, creating a brief window where IPv6 traffic on port 443 to the private ALB is not covered.
| moved { | |
| from = aws_vpc_security_group_ingress_rule.ravion_https_private_ipv4 | |
| to = aws_vpc_security_group_ingress_rule.private_https_ipv4 | |
| } | |
| moved { | |
| from = aws_vpc_security_group_ingress_rule.ravion_https_private_ipv4 | |
| to = aws_vpc_security_group_ingress_rule.private_https_ipv4 | |
| } | |
| moved { | |
| from = aws_vpc_security_group_ingress_rule.ravion_https_private_ipv6 | |
| to = aws_vpc_security_group_ingress_rule.private_https_ipv6 | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: compute/ecs_cluster/listeners.tf
Line: 185-188
Comment:
The private IPv6 SG rule rename is missing a `moved` block. Three of the four symmetrical SG rule renames have `moved` blocks (`ravion_https_ipv4`, `ravion_https_ipv6`, `ravion_https_private_ipv4`), but `ravion_https_private_ipv6` → `private_https_ipv6` is absent. For any cluster already in Ravion mode with a private ALB, Terraform will destroy the old rule and create a new one, creating a brief window where IPv6 traffic on port 443 to the private ALB is not covered.
```suggestion
moved {
from = aws_vpc_security_group_ingress_rule.ravion_https_private_ipv4
to = aws_vpc_security_group_ingress_rule.private_https_ipv4
}
moved {
from = aws_vpc_security_group_ingress_rule.ravion_https_private_ipv6
to = aws_vpc_security_group_ingress_rule.private_https_ipv6
}
```
How can I resolve this? If you propose a fix, please make it concise.| precondition { | ||
| condition = length(local.custom_domains) == 0 || (var.cluster_https_listener_arn != null && var.cluster_https_listener_arn != "") | ||
| error_message = "cluster_https_listener_arn is required when the domains list includes a custom (non-wildcard) domain." | ||
| } |
There was a problem hiding this comment.
ravion_domain.custom uses var.cluster_alb_dns_name and var.cluster_alb_zone_id as target_dns_name/target_zone_id, but there are no preconditions validating that these are non-null when custom domains are present. The sibling ravion_certificate.svc resource guards ravion_aws_account_id and cluster_https_listener_arn, but the two ALB variables are left unchecked. A user who provides custom domains and forgets these variables will get a cryptic provider-side null error instead of a clear Terraform message.
| precondition { | |
| condition = length(local.custom_domains) == 0 || (var.cluster_https_listener_arn != null && var.cluster_https_listener_arn != "") | |
| error_message = "cluster_https_listener_arn is required when the domains list includes a custom (non-wildcard) domain." | |
| } | |
| precondition { | |
| condition = length(local.custom_domains) == 0 || (var.cluster_https_listener_arn != null && var.cluster_https_listener_arn != "") | |
| error_message = "cluster_https_listener_arn is required when the domains list includes a custom (non-wildcard) domain." | |
| } | |
| precondition { | |
| condition = length(local.custom_domains) == 0 || (var.cluster_alb_dns_name != null && var.cluster_alb_dns_name != "") | |
| error_message = "cluster_alb_dns_name is required when the domains list includes a custom (non-wildcard) domain." | |
| } | |
| precondition { | |
| condition = length(local.custom_domains) == 0 || (var.cluster_alb_zone_id != null && var.cluster_alb_zone_id != "") | |
| error_message = "cluster_alb_zone_id is required when the domains list includes a custom (non-wildcard) domain." | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: compute/ecs_service/ravion_domains.tf
Line: 136-139
Comment:
`ravion_domain.custom` uses `var.cluster_alb_dns_name` and `var.cluster_alb_zone_id` as `target_dns_name`/`target_zone_id`, but there are no preconditions validating that these are non-null when custom domains are present. The sibling `ravion_certificate.svc` resource guards `ravion_aws_account_id` and `cluster_https_listener_arn`, but the two ALB variables are left unchecked. A user who provides custom domains and forgets these variables will get a cryptic provider-side null error instead of a clear Terraform message.
```suggestion
precondition {
condition = length(local.custom_domains) == 0 || (var.cluster_https_listener_arn != null && var.cluster_https_listener_arn != "")
error_message = "cluster_https_listener_arn is required when the domains list includes a custom (non-wildcard) domain."
}
precondition {
condition = length(local.custom_domains) == 0 || (var.cluster_alb_dns_name != null && var.cluster_alb_dns_name != "")
error_message = "cluster_alb_dns_name is required when the domains list includes a custom (non-wildcard) domain."
}
precondition {
condition = length(local.custom_domains) == 0 || (var.cluster_alb_zone_id != null && var.cluster_alb_zone_id != "")
error_message = "cluster_alb_zone_id is required when the domains list includes a custom (non-wildcard) domain."
}
```
How can I resolve this? If you propose a fix, please make it concise.| lifecycle { | ||
| precondition { | ||
| condition = !var.use_ravion_managed_domains || (var.ravion_aws_account_id != null && var.ravion_aws_account_id != "") | ||
| error_message = "ravion_aws_account_id (aws_*) is required when use_ravion_managed_domains = true." | ||
| } | ||
| precondition { | ||
| condition = length(var.domains) <= 10 | ||
| error_message = "A static site may declare at most 10 custom domains." | ||
| } | ||
| } |
There was a problem hiding this comment.
ravion_distribution_arn and ravion_distribution_domain both use try(...[0], null), which silently returns null if the module.cdn distribution maps are empty (i.e., no distributions are configured). These null values are then passed as target_arn to ravion_certificate.site and as target_dns_name to ravion_domain.custom with no guard, producing a cryptic provider-side error instead of a clear Terraform message. A precondition on ravion_certificate.site would surface this early.
| lifecycle { | |
| precondition { | |
| condition = !var.use_ravion_managed_domains || (var.ravion_aws_account_id != null && var.ravion_aws_account_id != "") | |
| error_message = "ravion_aws_account_id (aws_*) is required when use_ravion_managed_domains = true." | |
| } | |
| precondition { | |
| condition = length(var.domains) <= 10 | |
| error_message = "A static site may declare at most 10 custom domains." | |
| } | |
| } | |
| lifecycle { | |
| precondition { | |
| condition = !var.use_ravion_managed_domains || (var.ravion_aws_account_id != null && var.ravion_aws_account_id != "") | |
| error_message = "ravion_aws_account_id (aws_*) is required when use_ravion_managed_domains = true." | |
| } | |
| precondition { | |
| condition = length(var.domains) <= 10 | |
| error_message = "A static site may declare at most 10 custom domains." | |
| } | |
| precondition { | |
| condition = !var.use_ravion_managed_domains || local.ravion_distribution_arn != null | |
| error_message = "use_ravion_managed_domains = true requires at least one CloudFront distribution in var.distributions." | |
| } | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: hosting/static_site/ravion_domains.tf
Line: 34-43
Comment:
`ravion_distribution_arn` and `ravion_distribution_domain` both use `try(...[0], null)`, which silently returns `null` if the `module.cdn` distribution maps are empty (i.e., no distributions are configured). These null values are then passed as `target_arn` to `ravion_certificate.site` and as `target_dns_name` to `ravion_domain.custom` with no guard, producing a cryptic provider-side error instead of a clear Terraform message. A precondition on `ravion_certificate.site` would surface this early.
```suggestion
lifecycle {
precondition {
condition = !var.use_ravion_managed_domains || (var.ravion_aws_account_id != null && var.ravion_aws_account_id != "")
error_message = "ravion_aws_account_id (aws_*) is required when use_ravion_managed_domains = true."
}
precondition {
condition = length(var.domains) <= 10
error_message = "A static site may declare at most 10 custom domains."
}
precondition {
condition = !var.use_ravion_managed_domains || local.ravion_distribution_arn != null
error_message = "use_ravion_managed_domains = true requires at least one CloudFront distribution in var.distributions."
}
}
```
How can I resolve this? If you propose a fix, please make it concise.ravion_certificate/ravion_domain now require module_instance_id so the resources work outside a Ravion stack run (service-account API key) as well as inside it. Add a module_instance_id variable to ecs_cluster + ecs_service and forward it to every ravion_certificate/ravion_domain block (cluster wildcard cert; service wildcard/custom domains + instance cert). Inside a stack run the runner injects TF_VAR_module_instance_id, so this is populated automatically; external/API-key runs set it explicitly. The control plane still prefers the signed token's instance when a stack-run JWT is present. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Greptile Summary
This PR introduces opt-in Ravion-managed domains across three modules (
ecs_cluster,ecs_service,static_site). The core architectural change is moving HTTPS listener ownership from thealbsubmodule intoecs_clusterso that togglinguse_ravion_managed_domainsbecomes an in-place certificate swap rather than a destroy+create across different Terraform addresses.ecs_cluster: Newravion_domains.tfprovisions a cluster-scoped wildcard ACM cert; newlisteners.tfowns both public/private HTTPS listeners withmovedblocks for BYO and Ravion-mode state migration.networking/albgains aforce_http_to_https_redirectvariable so the submodule can still redirect HTTP without owning the 443 listener.ecs_service: Newravion_domains.tfclassifies service domains as wildcard-covered (ride the cluster cert) or custom (get a per-service cert), derives sha256-based listener-rule priorities, and chunks host-header conditions to stay within ALB's 5-value per-condition limit.static_site: Newravion_domains.tfattaches a Ravion instance cert to the CloudFront distribution ARN (alwaysus-east-1per CloudFront requirement).Confidence Score: 3/5
Safe to merge after fixing the missing moved block and the two validation gaps; the core architectural change is well-reasoned with good test coverage.
The missing
movedblock for the private ALB IPv6 SG rule will cause a destroy+recreate on every existing Ravion-mode cluster that has a private ALB, creating a brief window without IPv6 coverage on port 443. The missing preconditions inecs_serviceforcluster_alb_dns_name/cluster_alb_zone_idmean users providing custom domains without those variables will receive a cryptic provider error. Both issues affect the primary apply path for the features being introduced.compute/ecs_cluster/listeners.tf (missing moved block) and compute/ecs_service/ravion_domains.tf (incomplete validation on ravion_certificate.svc)
Important Files Changed
movedblock forravion_https_private_ipv6→private_https_ipv6, which will cause a destroy+recreate of that SG rule on existing Ravion-mode clusters.ravion_certificate.svcforcluster_alb_dns_nameandcluster_alb_zone_idwhen custom domains are present, leading to null values passed toravion_domain.customwithout a helpful error.ravion_distribution_arn/ravion_distribution_domaincan silently be null if no CDN distribution is configured; no precondition guards this before it propagates totarget_arnandtarget_dns_name.force_http_to_https_redirect; cert_arns set to []. Clean change consistent with the architectural intent.force_http_to_https_redirectto allow parent modules to own the 443 listener while still redirecting HTTP. Change is minimal and correct.Prompt To Fix All With AI
Reviews (1): Last reviewed commit: "docs(ecs_service): parent-apex guard wor..." | Re-trigger Greptile