[OSDOCS-19926]: Private hosted clusters on Azure#113083
Conversation
|
🤖 Fri Jun 12 20:10:27 - Prow CI generated the docs preview: |
e704db9 to
f540268
Compare
26bc1df to
51e051e
Compare
f42f4a7 to
f33171b
Compare
| |Private endpoint creation failed or the connection was not approved. Check the Private Link auto-approval list and the Control Plane Operator logs. | ||
|
|
||
| |`AzurePrivateDNSAvailable` = `False` | ||
| |The DNS zone or record creation failed. Check the Control Plane Operator identity permissions in the guest subscription. |
There was a problem hiding this comment.
@bryan-cox / @Nirshal - I'm not sure what "guest subscription" is referring to here. I got this wording from the upstream docs. Can you clarify?
There was a problem hiding this comment.
"Guest subscription" refers to the Azure subscription where the hosted cluster's infrastructure resources (VNet, subnet, VMs) are deployed. In many setups this is the same subscription as the management cluster, but it can be different. The CPO identity needs permissions in that subscription to create and manage private DNS zones and records. @bryan-cox can you confirm?
There was a problem hiding this comment.
Thanks for the explanation, @Nirshal. One reason I asked is because I sometimes see "guest cluster" to mean "hosted cluster" in the upstream docs, and in the official docs, we try to consistently use "hosted cluster".
I wonder if this revision would be accurate?
The DNS zone or record creation failed. In the {azure-short} subscription that stores the infrastructure resources for the hosted cluster, check the Control Plane Operator identity permissions.
There was a problem hiding this comment.
That revision looks accurate and clearer. It avoids the "guest" terminology ambiguity while keeping the meaning. Looks good to me.
3bfbf8e to
47f4a74
Compare
36651d7 to
4e99056
Compare
bryan-cox
left a comment
There was a problem hiding this comment.
SME review (Bryan) — technical accuracy review against the HyperShift codebase. 3 blocking issues and 4 suggestions below.
|
|
||
| * xref:../../hosted_control_planes/hcp-deploy/hcp-deploy-azure.adoc#hcp-azure-private-iam_hcp-deploy-azure[Configuring IAM resources for a private hosted cluster] | ||
|
|
||
| * xref:../../hosted_control_planes/hcp-deploy/hcp-deploy-azure.adoc#hcp-azure-infra_hcp-deploy-azure[Creating infrastructure for a private hosted cluster] |
There was a problem hiding this comment.
[blocking] Broken cross-reference — this links to #hcp-azure-infra_hcp-deploy-azure, which is the public infrastructure module. The private infrastructure module has the ID hcp-azure-private-infra_{context}, so this should be:
* xref:../../hosted_control_planes/hcp-deploy/hcp-deploy-azure.adoc#hcp-azure-private-infra_hcp-deploy-azure[Creating infrastructure for a private hosted cluster]
As written, clicking this link takes users to the public infra section instead of the private one.
There was a problem hiding this comment.
Good catch! Fixing.
| |`AzurePLSCreated` = `False` | ||
| |Private Link creation failed. Check the NAT subnet policies, credentials, and the HyperShift Operator logs. | ||
|
|
||
| |`AzurePrivateEndPointAvailable` = `False` |
There was a problem hiding this comment.
[blocking] Condition name typo — the codebase defines this as AzurePrivateEndpointAvailable (lowercase 'p' in "point"), but the doc writes AzurePrivateEndPointAvailable (capital 'P'). Users who grep their YAML output for this condition will get no matches.
Ref: api/hypershift/v1beta1/azureprivatelinkservice_types.go
| $ hcp destroy cluster azure \ | ||
| --name ${CLUSTER_NAME} \ | ||
| --azure-creds ${AZURE_CREDS} \ | ||
| --resource-group-name ${MANAGED_RG_NAME} |
There was a problem hiding this comment.
[blocking] Missing \ line continuation — this line needs a trailing backslash after ${MANAGED_RG_NAME}. Without it, --dns-zone-rg-name on the next line is not passed to the command, and users will get a "dns-zone-rg-name is required" error.
| --resource-group-name ${MANAGED_RG_NAME} | |
| --resource-group-name ${MANAGED_RG_NAME} \ |
| --output-file "${WORKLOAD_IDENTITIES_FILE}" | ||
| ---- | ||
| + | ||
| The command creates 8 Workload Identities, including the Control Plane Operator identity, which allows the Operator to create and manage private endpoints, private DNS zones, {azure-short} Virtual Network (VNet) links, and DNS A records. The Control Plane Identity is assigned the `Contributor` role by default. To use a more restrictive role, use the `--assign-custom-hcp-roles` flag. No newline at end of file |
There was a problem hiding this comment.
[suggestion] The "8 Workload Identities" count is accurate, but the phrasing implies these are specific to private clusters. The same hcp create iam azure command creates the same 8 identities regardless of topology — the CPO identity is simply unused for public clusters. Consider clarifying, e.g.:
The command creates 8 Workload Identities. For private and PublicAndPrivate clusters, the Control Plane Operator identity allows the Operator to create and manage private endpoints, private DNS zones, VNet links, and DNS A records.
There was a problem hiding this comment.
Nice revision. I'll update the text.
|
|
||
| [role="_abstract"] | ||
| The first step to set up a private hosted cluster on {azure-short} is to prepare the subnet. {azure-short} Private Link requires a dedicated subnet for network address translator (NAT) IP address allocation. | ||
|
|
There was a problem hiding this comment.
[suggestion] The NAT subnet creation is actually optional. If --endpoint-access-private-nat-subnet-id is omitted from the cluster creation command, the HyperShift Operator controller auto-creates a NAT subnet named {infraID}-pls-nat. Consider adding a note explaining this tradeoff: manually creating the subnet gives control over CIDR allocation and naming, while omitting it lets the controller handle it automatically.
There was a problem hiding this comment.
Didn't know that! Will add a note.
| [role="_abstract"] | ||
| If you are no longer using a private hosted cluster on {azure-short}, you can delete it. | ||
|
|
||
| The deletion process automatically cleans up Private Link resources in the following order: |
There was a problem hiding this comment.
[suggestion] The deletion flow described is accurate for Private Link resource cleanup, but it's worth noting that hcp destroy cluster azure also cleans up RBAC role assignments before deleting infrastructure. This is why the --azure-creds flag is required for the destroy command.
There was a problem hiding this comment.
Added a statement about that.
| --external-dns-domain-filter ${DNS_ZONE} | ||
| ---- | ||
| + | ||
| * `--private-platform Azure` specifies that {azure-short} Private Link management is to be enabled in the Operator. |
There was a problem hiding this comment.
[suggestion] The --azure-private-secret flag has a companion flag --azure-private-secret-key that defaults to credentials but can be customized for secrets with non-standard key names. Consider mentioning it here for completeness.
4e99056 to
0e0c954
Compare
| * The {oc-first} is installed. | ||
|
|
||
| * If you are using external DNS, the `jq` command-line JSON processor is installed. | ||
|
|
There was a problem hiding this comment.
The procedure uses yq to parse the infrastructure output (step 3), but yq is not listed in the prerequisites — jq is listed instead but doesn't appear to be used in this module.
There was a problem hiding this comment.
@Nirshal I can change the instances of yq in the steps to be jq. Is that okay? Or should I remove the jq prereq and add a yq prereq?
There was a problem hiding this comment.
The infrastructure output from hcp create infra azure is YAML, so yq is the right tool in the steps. I'd suggest removing the jq prerequisite and adding yq instead.
| ---- | ||
| + | ||
| * `--private-platform Azure` specifies that {azure-short} Private Link management is to be enabled in the Operator. | ||
| * `--azure-private-creds` specifies the path to the {azure-short} credentials file that is used for Private Link operations. |
There was a problem hiding this comment.
Nit: trailing space after the period on this line.
| --workload-identities-file ${WORKLOAD_IDENTITIES_FILE} \ | ||
| --diagnostics-storage-account-type Managed \ | ||
| --external-dns-domain ${DNS_ZONE_NAME} \ | ||
| --endpoint-access Private \ |
There was a problem hiding this comment.
When choosing the --external-dns-domain value, the user must be aware that if it matches {cluster-name}.{base-domain}, the CPO creates a private DNS zone that can shadow the *.apps domain, causing console and ingress to become unreachable.
For reference:
- OCPBUGS-83730 — original bug report
- hypershift#8480 — fix merged on 2026-05-21
- hypershift#8585 — fix reverted on 2026-05-26 due to CEL validation impact on ARO-HCP
As far as I know, the issue is still present. It might be worth adding a note to help users avoid this condition when picking their DNS zone name. @bryan-cox any additional context on the current state?
There was a problem hiding this comment.
I added a note below the command.
eded350 to
abab4c9
Compare
bryan-cox
left a comment
There was a problem hiding this comment.
SME review 2 (Bryan) — second-pass technical accuracy review. All 3 blocking issues from review 1 and all of Alessandro's findings are confirmed fixed. No new blocking issues found. 4 non-blocking suggestions below.
| --dns-zone-rg-name ${DNS_ZONE_RG_NAME} \ | ||
| --assign-service-principal-roles \ | ||
| --workload-identities-file ${WORKLOAD_IDENTITIES_FILE} \ | ||
| --diagnostics-storage-account-type Managed \ |
There was a problem hiding this comment.
[non-blocking] --diagnostics-storage-account-type Managed is not specific to private clusters — it is a general VM diagnostics option. Including it here without explanation implies it is required for private connectivity. Consider either removing it (it is optional and unrelated to Private Link) or adding a note that it is a general configuration option, not a private-cluster requirement.
| The deletion process automatically cleans up Private Link resources in the following order: | ||
|
|
||
| . The Control Plane Operator removes the private endpoint, private DNS zones, VNet links, and A records. | ||
| . The HyperShift Operator removes Private Link. |
There was a problem hiding this comment.
[non-blocking] The cleanup order is slightly simplified. The hcp destroy cluster azure command also cleans up RBAC role assignments before deleting infrastructure — the --azure-creds flag description below already mentions this. Consider adding a bullet for RBAC cleanup between steps 1 and 2 for consistency, e.g.:
- The Control Plane Operator removes the private endpoint, private DNS zones, VNet links, and A records.
- The
hcp destroycommand removes RBAC role assignments.- The HyperShift Operator removes Private Link.
| ** `--azure-private-secret` specifies an existing Kubernetes secret that has {azure-short} credentials. This flag has a companion flag, `--azure-private-secret-key`. Its default value is `credentials`, but you can customize it for secrets that have non-standard key names. | ||
| ** `--azure-pls-managed-identity-client-id` specifies the client ID of a managed identity for Private Link operations through Workload Identity federation. If you specify this flag, you must also include the `--azure-pls-subscription-id` flag, which specifies the {azure-short} subscription ID for Private Link operations. | ||
| ==== | ||
| * `--azure-pls-resource-group` specifies the resource group where the Private Link resources are to be created. This resource group is the same as the resource group of the infrastructure for the management cluster. |
There was a problem hiding this comment.
[non-blocking] The NOTE block covers alternative auth methods for Private Link credentials (--azure-private-secret, --azure-pls-managed-identity-client-id). For completeness, there is a parallel alternative for external DNS credentials: --external-dns-secret can be used instead of --external-dns-credentials to reference an existing Kubernetes secret. Minor omission since users typically follow one path.
| --output-file "${WORKLOAD_IDENTITIES_FILE}" | ||
| ---- | ||
| + | ||
| The command creates 8 Workload Identities. For `private` and `PublicAndPrivate` clusters, the Control Plane Operator identity allows the Operator to create and manage private endpoints, private DNS zones, {azure-short} Virtual Network (VNet) links, and DNS A records. The Control Plane Identity is assigned the `Contributor` role by default. To use a more restrictive role, use the `--assign-custom-hcp-roles` flag. No newline at end of file |
There was a problem hiding this comment.
[non-blocking] Minor wording suggestion — "The command creates 8 Workload Identities" is technically correct (the hcp create iam azure command always creates 8 regardless of topology), but the next sentence about the CPO identity could be slightly clearer. Consider leading with the private-cluster context, e.g.:
The command creates 8 Workload Identities. For
PrivateandPublicAndPrivateclusters, the Control Plane Operator identity is used to create and manage private endpoints, private DNS zones, VNet links, and DNS A records.
abab4c9 to
7a32681
Compare
bscott-rh
left a comment
There was a problem hiding this comment.
Nice work! This mostly looks good to me. I left a couple of formatting comments, and a batch of comments around environment variables.
| .. Find the VNet in the infrastructure resource group as shown in the following example: | ||
| + | ||
| [source,bash] | ||
| ---- | ||
| MGMT_VNET_NAME=$(az network vnet list --resource-group "${MGMT_INFRA_RG}" --query "[0].name" -o tsv) | ||
| MGMT_VNET_RG="${MGMT_INFRA_RG}" | ||
| ---- |
There was a problem hiding this comment.
same comment about this block of environment variables.
| [role="_abstract"] | ||
| Create a private hosted cluster to ensure that the communication between compute nodes and the hosted control plane occurs over {azure-short} Private Link. | ||
|
|
||
| .Prerequisites |
There was a problem hiding this comment.
Since this procedure seems to rely on all of the environment variables from the previous steps, you might want to add an explicit prerequisite stating this. If a user completes the previous steps in a separate terminal, or they completed the steps on a prior day and then closed their terminal, the environment variables would be lost.
There was a problem hiding this comment.
Good point. I'll mention the environment variables in the prereqs.
| + | ||
| [source,bash] | ||
| ---- | ||
| KUBECONFIG=${CLUSTER_NAME}-kubeconfig oc get nodes |
There was a problem hiding this comment.
| KUBECONFIG=${CLUSTER_NAME}-kubeconfig oc get nodes | |
| $ KUBECONFIG=${CLUSTER_NAME}-kubeconfig oc get nodes |
| + | ||
| [source,bash] | ||
| ---- | ||
| PEERED_VNET_ID="/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Network/virtualNetworks/<vnet>" |
There was a problem hiding this comment.
| PEERED_VNET_ID="/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Network/virtualNetworks/<vnet>" | |
| $ PEERED_VNET_ID="/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Network/virtualNetworks/<vnet>" |
| + | ||
| [source,bash] | ||
| ---- | ||
| KUBECONFIG=${CLUSTER_NAME}-kubeconfig oc get nodes |
There was a problem hiding this comment.
| KUBECONFIG=${CLUSTER_NAME}-kubeconfig oc get nodes | |
| $ KUBECONFIG=${CLUSTER_NAME}-kubeconfig oc get nodes |
| + | ||
| [source,bash] | ||
| ---- | ||
| AZURE_PRIVATE_CREDS="/path/to/azure-private-credentials.json" |
There was a problem hiding this comment.
| AZURE_PRIVATE_CREDS="/path/to/azure-private-credentials.json" | |
| $ AZURE_PRIVATE_CREDS="/path/to/azure-private-credentials.json" |
| + | ||
| [source,bash] | ||
| ---- | ||
| MGMT_INFRA_RG=$(oc get infrastructure cluster -o jsonpath='{.status.platformStatus.azure.resourceGroupName}') |
There was a problem hiding this comment.
| MGMT_INFRA_RG=$(oc get infrastructure cluster -o jsonpath='{.status.platformStatus.azure.resourceGroupName}') | |
| $ MGMT_INFRA_RG=$(oc get infrastructure cluster -o jsonpath='{.status.platformStatus.azure.resourceGroupName}') |
|
|
||
| . Set the external DNS configuration variables as shown in the following example: | ||
| + | ||
| [source,bash] |
There was a problem hiding this comment.
same comment about this block of environment variables.
| + | ||
| [source,bash] | ||
| ---- | ||
| MGMT_INFRA_RG=$(oc get infrastructure cluster -o jsonpath='{.status.platformStatus.azure.resourceGroupName}') |
There was a problem hiding this comment.
| MGMT_INFRA_RG=$(oc get infrastructure cluster -o jsonpath='{.status.platformStatus.azure.resourceGroupName}') | |
| $ MGMT_INFRA_RG=$(oc get infrastructure cluster -o jsonpath='{.status.platformStatus.azure.resourceGroupName}') |
| + | ||
| [source,bash] | ||
| ---- | ||
| NAT_SUBNET_ID=$(az network vnet subnet show \ |
There was a problem hiding this comment.
| NAT_SUBNET_ID=$(az network vnet subnet show \ | |
| $ NAT_SUBNET_ID=$(az network vnet subnet show \ |
7a32681 to
b33532b
Compare
|
New changes are detected. LGTM label has been removed. |
b33532b to
f9f7eb4
Compare
|
@lahinson: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/cherrypick enterprise-4.22 |
|
/cherrypick enterprise-5.0 |
|
@lahinson: new pull request created: #113292 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@lahinson: new pull request created: #113293 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Version(s): 4.22+
Issue: https://redhat.atlassian.net/browse/OSDOCS-19926
Link to docs preview: https://113083--ocpdocs-pr.netlify.app/openshift-enterprise/latest/hosted_control_planes/hcp-deploy/hcp-deploy-azure.html#hcp-azure-private_hcp-deploy-azure
QE review: In lieu of QE review, the self-managed HCP on Azure docs has 2 SME reviews.
Additional information: