Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion api/v1alpha2/cel_validation_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -477,7 +477,7 @@ func TestCEL_TLSPeerCertManagerAndSecretRefMutuallyExclusive(t *testing.T) {
_ = k8s.Delete(ctx, c)
t.Fatalf("apiserver accepted both peer.secretRef and peer.certManager; expected rejection")
}
if !strings.Contains(err.Error(), "exactly one of spec.tls.peer.secretRef or spec.tls.peer.certManager") {
if !strings.Contains(err.Error(), "exactly one of spec.tls.peer.secretRef") {
t.Fatalf("error did not mention peer mutual exclusion: %v", err)
}
Comment thread
androndo marked this conversation as resolved.
}
Expand Down
11 changes: 11 additions & 0 deletions api/v1alpha2/etcdmember_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,17 @@ type EtcdMemberTLS struct {
// mTLS (--peer-client-cert-auth=true).
// +optional
PeerSecretRef *corev1.LocalObjectReference `json:"peerSecretRef,omitempty"`

// PeerAutoTLS is operator-managed plumbing: it carries the cluster's
// reserved "etcd-operator.cozystack.io/peer-auto-tls" annotation down to
// the member so buildPod renders etcd's --peer-auto-tls (self-signed, no
// shared CA) instead of mounting a peer secret. INSECURE — peer is
// encrypted but NOT authenticated. Set only on clusters adopted from a
// legacy --peer-auto-tls cluster, and never together with PeerSecretRef
// (an explicit peer secret supersedes the annotation). Users do not set
// this directly; the cluster controller derives it.
// +optional
PeerAutoTLS bool `json:"peerAutoTLS,omitempty"`
}

// Condition types for EtcdMember.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1336,6 +1336,17 @@ spec:
type: string
type: object
x-kubernetes-map-type: atomic
peerAutoTLS:
description: |-
PeerAutoTLS is operator-managed plumbing: it carries the cluster's
reserved "etcd-operator.cozystack.io/peer-auto-tls" annotation down to
the member so buildPod renders etcd's --peer-auto-tls (self-signed, no
shared CA) instead of mounting a peer secret. INSECURE — peer is
encrypted but NOT authenticated. Set only on clusters adopted from a
legacy --peer-auto-tls cluster, and never together with PeerSecretRef
(an explicit peer secret supersedes the annotation). Users do not set
this directly; the cluster controller derives it.
type: boolean
peerSecretRef:
description: |-
PeerSecretRef mirrors EtcdClusterTLS.Peer.SecretRef. When nil, the
Expand Down
1 change: 1 addition & 0 deletions cmd/etcd-migrate/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,7 @@ func runMigration(ctx context.Context, cfg *Config, stdin io.Reader, stdout io.W
fmt.Fprintln(stdout, "\nNEXT: scale the new operator up — it will take over the adopted clusters without touching the pods:\n kubectl -n "+
mustNamespace(cfg.NewController)+" scale deploy "+mustName(cfg.NewController)+" --replicas=1")
}
renderSecuritySummary(stdout, plans)
printCRDNotice(stdout)
return errorIfPlanFailed(plans)
}
Expand Down
26 changes: 26 additions & 0 deletions cmd/etcd-migrate/output.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ func render(w io.Writer, plans []migrate.ResourcePlan) {
for _, e := range p.Errors {
fmt.Fprintf(w, " ERROR: %s\n", e)
}
for _, sw := range p.SecurityWarnings {
fmt.Fprintf(w, " ⚠️ SECURITY: %s\n", sw)
}
for _, warn := range p.Warnings {
fmt.Fprintf(w, " warning: %s\n", warn)
}
Expand Down Expand Up @@ -85,6 +88,29 @@ func renderManifest(w io.Writer, obj client.Object) {
_, _ = w.Write(data)
}

// renderSecuritySummary re-surfaces every SecurityWarning from the plans that
// were actually adopted, AFTER --apply has run. The pre-apply plan already
// shows them, but for a security-posture downgrade (e.g. an unauthenticated
// --peer-auto-tls peer plane) that is not enough: the plan scrolls past, so the
// operator must see the downgrade again in the closing summary, once it is a
// fait accompli. No-op when nothing was downgraded.
func renderSecuritySummary(w io.Writer, plans []migrate.ResourcePlan) {
var any bool
for i := range plans {
p := &plans[i]
if p.Action != migrate.ActionAdopt || len(p.SecurityWarnings) == 0 {
continue
}
if !any {
fmt.Fprintln(w, "\n⚠️ SECURITY — review before relying on the adopted clusters:")
any = true
}
for _, sw := range p.SecurityWarnings {
fmt.Fprintf(w, " • %s/%s: %s\n", p.Namespace, p.SourceName, sw)
}
}
}

// printCRDNotice reminds about the one cleanup step the tool never performs.
func printCRDNotice(w io.Writer) {
fmt.Fprintln(w, `
Expand Down
11 changes: 10 additions & 1 deletion controllers/etcdmember_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -712,7 +712,16 @@ func (r *EtcdMemberReconciler) buildPod(member *lll.EtcdMember) *corev1.Pod {
Name: "tls-client", MountPath: "/etc/etcd/tls/client", ReadOnly: true,
})
}
if peerTLS {
switch {
case member.Spec.TLS != nil && member.Spec.TLS.PeerAutoTLS:
// INSECURE legacy-compat peer mode: etcd generates a self-signed peer
// cert per member with no shared CA, so peer is encrypted but NOT
// authenticated and there is nothing to mount. Only reached for
// clusters adopted from a --peer-auto-tls legacy cluster: the cluster
// controller derives this from the reserved AnnPeerAutoTLS annotation
// etcd-migrate stamps (see AnnPeerAutoTLS).
cmd = append(cmd, "--peer-auto-tls")
case peerTLS:
cmd = append(cmd,
"--peer-cert-file=/etc/etcd/tls/peer/tls.crt",
"--peer-key-file=/etc/etcd/tls/peer/tls.key",
Expand Down
60 changes: 60 additions & 0 deletions controllers/etcdmember_controller_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -2150,6 +2150,39 @@ func TestBuildPod_PeerTLSAlwaysMTLS(t *testing.T) {
}
}

// TestBuildPod_PeerAutoTLS: the legacy-compat insecure peer mode emits
// --peer-auto-tls on an https peer listener and mounts NO peer secret (etcd
// self-signs; there is no shared CA and no client-cert-auth).
func TestBuildPod_PeerAutoTLS(t *testing.T) {
r := &EtcdMemberReconciler{}
pod := r.buildPod(&lll.EtcdMember{
ObjectMeta: metav1.ObjectMeta{Name: "m", Namespace: "ns"},
Spec: lll.EtcdMemberSpec{
ClusterName: "test", Version: "3.5.17", Storage: lll.StorageSpec{Size: quickQty(t, "1Gi")},
TLS: &lll.EtcdMemberTLS{PeerAutoTLS: true},
},
})
cmd := pod.Spec.Containers[0].Command
if !cmdContains(cmd, "--listen-peer-urls=https://0.0.0.0:2380") {
t.Fatalf("peer listen URL not https: %v", cmd)
}
if !cmdContains(cmd, "--peer-auto-tls") {
t.Fatalf("expected --peer-auto-tls; got %v", cmd)
}
for _, unwanted := range []string{
"--peer-cert-file=/etc/etcd/tls/peer/tls.crt",
"--peer-trusted-ca-file=/etc/etcd/tls/peer/ca.crt",
"--peer-client-cert-auth=true",
} {
if cmdContains(cmd, unwanted) {
t.Fatalf("auto-tls must not set BYO peer flag %q: %v", unwanted, cmd)
}
}
if v := volumeFor(pod, "tls-peer"); v != nil {
t.Fatalf("auto-tls must mount no peer secret; got volume %+v", v)
}
}

// TestBuildPod_AlwaysExposesMetricsPort guards the cozystack-shaped
// monitoring contract: VMPodScrape (and equivalent Prometheus scrapers)
// target the named "metrics" container port unconditionally, and the
Expand Down Expand Up @@ -2460,6 +2493,7 @@ func TestDeriveMemberTLS(t *testing.T) {
hasClient bool
hasPeer bool
clientMTLS bool
peerAutoTLS bool
serverSecret string
opSecret string
peerSecret string
Expand Down Expand Up @@ -2536,6 +2570,29 @@ func TestDeriveMemberTLS(t *testing.T) {
}}}),
want: want{hasPeer: true, peerSecret: "etcd-peer-tls"},
},
{
// Legacy-compat --peer-auto-tls carried on the reserved cluster
// annotation (no typed spec.tls.peer) projects to PeerAutoTLS.
name: "peer-auto-tls annotation only",
in: func() *lll.EtcdCluster {
c := withName(&lll.EtcdCluster{})
c.Annotations = map[string]string{AnnPeerAutoTLS: "true"}
return c
}(),
want: want{peerAutoTLS: true},
},
{
// An explicit peer secretRef supersedes the annotation.
name: "peer secretRef beats peer-auto-tls annotation",
in: func() *lll.EtcdCluster {
c := withName(&lll.EtcdCluster{Spec: lll.EtcdClusterSpec{TLS: &lll.EtcdClusterTLS{
Peer: &lll.PeerTLS{SecretRef: &corev1.LocalObjectReference{Name: "p"}},
}}})
c.Annotations = map[string]string{AnnPeerAutoTLS: "true"}
return c
}(),
want: want{hasPeer: true, peerSecret: "p"},
},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
Expand All @@ -2555,6 +2612,9 @@ func TestDeriveMemberTLS(t *testing.T) {
if (got.PeerSecretRef != nil) != tc.want.hasPeer {
t.Fatalf("hasPeer = %v; want %v", got.PeerSecretRef != nil, tc.want.hasPeer)
}
if got.PeerAutoTLS != tc.want.peerAutoTLS {
t.Fatalf("PeerAutoTLS = %v; want %v", got.PeerAutoTLS, tc.want.peerAutoTLS)
}
if got.ClientMTLS != tc.want.clientMTLS {
t.Fatalf("ClientMTLS = %v; want %v", got.ClientMTLS, tc.want.clientMTLS)
}
Expand Down
66 changes: 55 additions & 11 deletions controllers/helpers.go
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,22 @@ const (
// (validDataDirSubPath) — an annotation has no apiserver schema, so the
// controller fails closed against a mount-escaping value.
AnnDataDirSubPath = ReservedAnnotationPrefix + "data-dir-subpath"

// AnnPeerAutoTLS, set to "true" on an EtcdCluster, runs the peer plane
// with etcd's --peer-auto-tls: per-member self-signed certs, NO shared
// CA, so peer traffic is encrypted but NOT authenticated. This is a
// migration-only knob etcd-migrate stamps when adopting a legacy cluster
// that ran the previous operator's unconditional --peer-auto-tls default
// (no CA exists to do real mTLS, so a strict-mTLS replacement could never
// rejoin the still-auto-tls members). Unlike AnnHeadlessServiceName /
// AnnDataDirSubPath it is cluster-level and does NOT self-wipe: the
// controller propagates it to every member it builds so replacement/
// scaled members keep interoperating. Deliberately NOT a typed spec field
// — an unauthenticated peer plane must not be a discoverable, CEL-blessed
// option for new clusters; an undocumented reserved key is the lesser
// footgun. Superseded by an explicit spec.tls.peer.secretRef/certManager
// (real mTLS wins; precedence lives in clusterPeerAutoTLS).
AnnPeerAutoTLS = ReservedAnnotationPrefix + "peer-auto-tls"
)

// etcdDataDirRoot is the mount path of every member's data volume; --data-dir
Expand Down Expand Up @@ -136,14 +152,33 @@ func clusterClientScheme(cluster *lll.EtcdCluster) string {
}

// clusterPeerScheme returns "https" when the cluster has peer TLS configured,
// "http" otherwise.
// "http" otherwise. The legacy-compat --peer-auto-tls mode (carried on the
// AnnPeerAutoTLS annotation, no typed spec.tls.peer) also serves peer over
// https, so it counts too.
func clusterPeerScheme(cluster *lll.EtcdCluster) string {
if cluster != nil && cluster.Spec.TLS != nil && cluster.Spec.TLS.Peer != nil {
return "https"
}
if clusterPeerAutoTLS(cluster) {
return "https"
}
return "http"
}

// clusterPeerAutoTLS reports whether the cluster runs the legacy-compat
// --peer-auto-tls peer mode, carried on the reserved AnnPeerAutoTLS annotation
// (see its doc). An explicit typed peer TLS mode (secretRef/certManager) always
// wins, so the annotation is honoured only when spec.tls.peer is unset.
func clusterPeerAutoTLS(cluster *lll.EtcdCluster) bool {
if cluster == nil {
return false
}
if cluster.Spec.TLS != nil && cluster.Spec.TLS.Peer != nil {
return false
}
return cluster.Annotations[AnnPeerAutoTLS] == "true"
}

// memberClientScheme is the per-member counterpart to clusterClientScheme,
// keyed off the propagated EtcdMemberSpec.TLS.
func memberClientScheme(member *lll.EtcdMember) string {
Expand All @@ -155,7 +190,8 @@ func memberClientScheme(member *lll.EtcdMember) string {

// memberPeerScheme is the per-member counterpart to clusterPeerScheme.
func memberPeerScheme(member *lll.EtcdMember) string {
if member != nil && member.Spec.TLS != nil && member.Spec.TLS.PeerSecretRef != nil {
if member != nil && member.Spec.TLS != nil &&
(member.Spec.TLS.PeerSecretRef != nil || member.Spec.TLS.PeerAutoTLS) {
return "https"
}
return "http"
Expand All @@ -179,19 +215,27 @@ func buildInitialCluster(peerScheme string, names []string, service, namespace s
// Secret names regardless of source, so buildPod / ensurePod /
// buildOperatorTLSConfig stay source-agnostic.
func deriveMemberTLS(cluster *lll.EtcdCluster) *lll.EtcdMemberTLS {
if cluster == nil || cluster.Spec.TLS == nil {
return nil
}
if cluster.Spec.TLS.Client == nil && cluster.Spec.TLS.Peer == nil {
if cluster == nil {
return nil
}
out := &lll.EtcdMemberTLS{}
if name := serverSecretName(cluster); name != "" {
out.ClientServerSecretRef = &corev1.LocalObjectReference{Name: name}
out.ClientMTLS = operatorClientSecretName(cluster) != ""
if cluster.Spec.TLS != nil {
if name := serverSecretName(cluster); name != "" {
out.ClientServerSecretRef = &corev1.LocalObjectReference{Name: name}
out.ClientMTLS = operatorClientSecretName(cluster) != ""
}
if name := peerSecretName(cluster); name != "" {
out.PeerSecretRef = &corev1.LocalObjectReference{Name: name}
}
}
// Carry the legacy-compat --peer-auto-tls posture (a cluster-level
// reserved annotation, not typed spec) down to the member. clusterPeerAutoTLS
// already yields false when an explicit peer mode is set, so real mTLS wins.
if out.PeerSecretRef == nil && clusterPeerAutoTLS(cluster) {
out.PeerAutoTLS = true
}
if name := peerSecretName(cluster); name != "" {
out.PeerSecretRef = &corev1.LocalObjectReference{Name: name}
if out.ClientServerSecretRef == nil && out.PeerSecretRef == nil && !out.PeerAutoTLS {
return nil
}
return out
}
Expand Down
41 changes: 41 additions & 0 deletions docs/migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,47 @@ TLS caveat: the legacy API kept CAs in separate Secrets
merge the CA into the referenced Secret **before** starting the new operator
(with cert-manager-issued secrets, `ca.crt` is typically already in place).

### Peer auto-TLS (legacy `--peer-auto-tls`)

The legacy operator ran etcd with `--peer-auto-tls` **unconditionally** unless
you supplied a BYO peer Secret. Under that flag each member generates its own
self-signed peer certificate and there is **no shared CA**: peer traffic is
encrypted but **not authenticated** — any TLS-capable workload that can reach a
member's `:2380` can peer with the cluster or impersonate a member. This is a
weaker posture than the real mutual-TLS the native operator offers via
`spec.tls.peer.secretRef` / `spec.tls.peer.certManager`, and it is **not** the
same thing as the [SAN-coverage caveat](#endpoint-compatibility) above (that is
about explicit mTLS certs needing both DNS domains during rollover — a different
scenario; don't conflate them).

The tool **detects this and carries it forward**, because it has to: with no CA
in existence there is nothing to mint real mTLS certs from, so a replacement or
scaled-up member running strict mTLS (or plaintext peer) could never rejoin the
still-auto-tls members. Carry-forward keeps replacement/scale working.

It is **not** exposed as a typed spec field — an unauthenticated peer plane must
not be a discoverable, first-class option for new clusters. Instead the tool
stamps a reserved cluster annotation:

```yaml
metadata:
annotations:
etcd-operator.cozystack.io/peer-auto-tls: "true"
```

The operator reads it and propagates `--peer-auto-tls` to every member it builds
for that cluster. It is superseded by an explicit `spec.tls.peer.secretRef` /
`certManager` (real mTLS always wins). The dry-run plan flags the adoption with a
loud `⚠️ SECURITY:` line, and the post-`--apply` summary re-surfaces it — you
cannot complete a migration without being told you adopted an unauthenticated
peer plane.

**The only off-ramp to real mTLS is delete-and-recreate** (`spec.tls` is
immutable), or a careful manual rolling restart onto BYO/cert-manager peer
certs. Because strict-mTLS and auto-tls members **cannot peer with each other**,
either route has a brief no-quorum window at the cutover — plan it like any
peer-cert rotation.

### The safety backup

Adoption rewires ownership of live storage, so the tool snapshots every
Expand Down
Loading
Loading