Skip to content

CKS node Instance stuck in "Starting" state (VM lifecycle) when a non-default (explicitly selected) template is used; works with the default SystemVM template #13471

Description

@akoskuczi-bw

problem

When creating a CKS Kubernetes cluster and explicitly selecting a non-default node template (the stock cks-ubuntu-2204-kvm CKS-ready image) in Advanced Settings, the node Instance stays in the Starting VM lifecycle state indefinitely and never transitions to Running — even though the libvirt domain is actually up on the KVM host (the VNC console is reachable and the guest OS boots).

The same cluster creation succeeds when the default SystemVM template (systemvm-kvm-4.22.0-x86_64) is used for the nodes. The only changed variable between the working and failing cases is the node template selection, which points at the Flexible Kubernetes Clusters per-node template selection path rather than at in-guest provisioning (cloud-init / kubeadm / CNI).

STEPS TO REPRODUCE

  1. ACS 4.22.1.0 on KVM, CKS enabled.
  2. Register the stock CKS-ready Ubuntu 22.04 KVM template (cks-ubuntu-2204-kvm), marked "For CKS".
  3. Register a supported Kubernetes binaries ISO/version.
  4. Create a CloudManaged Kubernetes cluster (3 control node HA) on a VPC tier network, and in Advanced Settings explicitly select cks-ubuntu-2204-kvm as the node template.
  5. Observe the control node Instance in the UI (Instances → the control node): it stays in Starting.
  6. Repeat the exact same cluster creation but do not select a template (use the default systemvm-kvm-4.22.0-x86_64): the node reaches Running and the cluster provisions normally.

EXPECTED RESULTS

Selecting a "For CKS" template in Advanced Settings should behave the same as the default template path: the node Instance transitions StartingRunning within the normal VM start window, after which CKS provisioning proceeds and the cluster reaches Running.

ACTUAL RESULTS

Behaviour depends solely on the node template selection:

Plain Instance deployed from cks-ubuntu-2204-kvm (no CKS) | Reaches Running — works
CKS node from default SystemVM template systemvm-kvm-4.22.0-x86_64 | Reaches Running, cluster provisions — works
CKS node from selected cks-ubuntu-2204-kvm template | Instance stuck in Starting indefinitely

In the failing case:

  • The control node Instance remains in Starting (VM lifecycle state) indefinitely, so the cluster also stays in Starting.
  • The libvirt domain is genuinely running on the KVM host: the VNC console is reachable and the guest OS boots.
  • SSH to the node (port 2222) is not usable; the UI-displayed password is not accepted on the VNC console (consistent with the node never being marked Running and/or the start/provisioning workflow not completing).

versions

Hypervisor version: KVM Ubuntu 24.04
ACS version: 4.22.1.0
Management server: Ubuntu 24.04

The steps to reproduce the bug

No response

What to do about it?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions