fix(main): add support peer-auto-tls#330
Conversation
|
Warning Review limit reached
More reviews will be available in 14 minutes and 45 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more credits in the billing tab to continue. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (12)
📝 WalkthroughWalkthroughAdds peer auto-TLS support: new EtcdMemberTLS.PeerAutoTLS and CRD field, detection of legacy ChangesPeer Auto-TLS for Legacy Adoption
Sequence DiagramsequenceDiagram
participant Migrate as etcd-migrate
participant BuildAdopt as BuildAdoption
participant LiveCluster as Live Member Facts
participant Plan as ResourcePlan
participant Render as Migration Output
Migrate->>BuildAdopt: start adoption detection
BuildAdopt->>LiveCluster: scan member peer URLs for "https://"
LiveCluster-->>BuildAdopt: found HTTPS peer URL
BuildAdopt->>Plan: set AnnPeerAutoTLS annotation
BuildAdopt->>Plan: append SecurityWarning ("peer-auto-tls")
Migrate->>Render: render dry-run plan
Render->>Render: print per-plan "⚠️ SECURITY:" warnings
Migrate->>Migrate: --apply (adopt)
Migrate->>Render: render post-apply summary
Render->>Render: re-surface SecurityWarnings for adopted plans
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
6707794 to
10b0bd1
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces support for etcd's --peer-auto-tls (AutoTLS) configuration to enable peer TLS with self-signed certificates and no shared CA, which is particularly useful for adopting legacy clusters. The changes include updating API types, adding CEL validation rules for mutual exclusion, modifying the controller to apply the --peer-auto-tls flag, and implementing migration logic to detect and carry forward this setting from legacy clusters. The review feedback suggests improving test assertions in cel_validation_test.go to check for the full validation error message and adding a CEL validation rule on EtcdMemberTLS to enforce the mutual exclusion of peerAutoTLS and peerSecretRef.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
charts/etcd-operator/crd-bases/etcd-operator.cozystack.io_etcdmembers.yaml (1)
1339-1345: ⚡ Quick winConsider adding CEL validation for mutual exclusivity.
While EtcdMembers are internal resources created by the EtcdCluster controller (and validation happens at the cluster level), adding a schema-level CEL rule here would provide defense-in-depth against misconfiguration. The documentation mentions mutual exclusivity with PeerSecretRef, but nothing prevents both fields from being set if a user directly creates an EtcdMember.
🛡️ Optional CEL validation to enforce mutual exclusivity
Add this x-kubernetes-validations rule under the
tlsproperty (around line 1363):type: object x-kubernetes-validations: - message: spec.tls.peerAutoTLS and spec.tls.peerSecretRef are mutually exclusive rule: '!(has(self.peerAutoTLS) && self.peerAutoTLS && has(self.peerSecretRef))'Also consider updating the description to clarify this mode is primarily intended for legacy cluster adoption where
--peer-auto-tlswas already in use, as the security trade-off (unauthenticated peers) makes it unsuitable for new deployments.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@charts/etcd-operator/crd-bases/etcd-operator.cozystack.io_etcdmembers.yaml` around lines 1339 - 1345, Add a CEL validation to the tls schema to enforce mutual exclusivity between peerAutoTLS and peerSecretRef: update the tls property (where peerAutoTLS and peerSecretRef are defined) to include an x-kubernetes-validations entry that fails when both are present/true, referencing the symbols peerAutoTLS and peerSecretRef; also update the peerAutoTLS description to clarify it’s for legacy/--peer-auto-tls use and highlights the unauthenticated peer security trade-off.api/v1alpha2/cel_validation_test.go (1)
485-513: ⚡ Quick winAdd test case for autoTLS + certManager combination.
The test covers autoTLS alone (accepted) and autoTLS + secretRef (rejected), but does not verify that autoTLS + certManager is also rejected by the exactly-one-of rule. Without this coverage, a faulty CEL rule that only validates secretRef/autoTLS pairs could pass.
📝 Add autoTLS + certManager test case
if !strings.Contains(err.Error(), "exactly one of spec.tls.peer.secretRef") { t.Fatalf("error did not mention peer mutual exclusion: %v", err) } + + // autoTLS + certManager — rejected. + badCM := validCluster("tls-peer-autotls-certmanager") + badCM.Spec.TLS = &lll.EtcdClusterTLS{Peer: &lll.PeerTLS{ + AutoTLS: &lll.PeerAutoTLS{}, + CertManager: &lll.PeerCertManagerTLS{IssuerRef: lll.IssuerReference{Name: "my-peer-ca"}}, + }} + err = k8s.Create(ctx, badCM) + if err == nil { + _ = k8s.Delete(ctx, badCM) + t.Fatalf("apiserver accepted peer.autoTLS + peer.certManager; expected rejection") + } + if !strings.Contains(err.Error(), "exactly one of spec.tls.peer.secretRef") { + t.Fatalf("error did not mention peer mutual exclusion for certManager combination: %v", err) + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/v1alpha2/cel_validation_test.go` around lines 485 - 513, Extend TestCEL_TLSPeerAutoTLS to include a case that constructs a cluster with Spec.TLS.Peer having both AutoTLS set (lll.PeerAutoTLS{}) and the cert-manager option populated (e.g., Spec.TLS.Peer.CertManager = &lll.PeerCertManager{...}), attempt to create it via k8s.Create(ctx, ...), assert creation is rejected (err != nil), and assert the error message mentions the same "exactly one of spec.tls.peer..." mutual-exclusion text; update TestCEL_TLSPeerAutoTLS (and reuse validCluster, ok/bad variables) so the new branch mirrors the existing autoTLS+secretRef negative test but uses the CertManager field instead of SecretRef.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/v1alpha2/etcdcluster_types.go`:
- Around line 216-228: Update the top-level comment for the peer TLS
configuration (the struct/field describing spec.tls.peer) to mention the new
unauthenticated autoTLS mode and its security implications: explicitly state
that spec.tls.peer can be mTLS (authenticated via SecretRef/CertManager) or
autoTLS (PeerAutoTLS/AutoTLS) which provides encryption without peer
authentication, and note mutual exclusivity with SecretRef/CertManager and that
autoTLS is insecure for untrusted networks; ensure the same wording/clarity is
applied to the CRD description that documents spec.tls.peer so both the
etcdcluster_types.go peer TLS doc and the generated CRD docs align with the
AutoTLS/PeerAutoTLS warning.
In `@internal/migrate/adopt.go`:
- Around line 188-213: The current logic unconditionally sets
cluster.Spec.TLS.Peer.AutoTLS when any member has an https:// PeerURL, which
incorrectly assumes a uniform peer mode; update the adoption logic (the block
using peerTLSDeclared and iterating members) to first scan all members' PeerURL
values and detect mixed schemes (both http:// and https://). If mixed schemes
are found, do not set spec.tls.peer.autoTLS and instead surface a clear adoption
error (e.g. append to plan.Errors or return a failure) so deriveAdoptedMemberTLS
and buildPod are not forced into a cluster-wide --peer-auto-tls mode; only set
spec.tls.peer.autoTLS when every observed PeerURL is https://. Ensure
plan.Warnings remains unchanged for the pure-https path and reference
deriveAdoptedMemberTLS and buildPod in the error message so callers know why
adoption was rejected.
---
Nitpick comments:
In `@api/v1alpha2/cel_validation_test.go`:
- Around line 485-513: Extend TestCEL_TLSPeerAutoTLS to include a case that
constructs a cluster with Spec.TLS.Peer having both AutoTLS set
(lll.PeerAutoTLS{}) and the cert-manager option populated (e.g.,
Spec.TLS.Peer.CertManager = &lll.PeerCertManager{...}), attempt to create it via
k8s.Create(ctx, ...), assert creation is rejected (err != nil), and assert the
error message mentions the same "exactly one of spec.tls.peer..."
mutual-exclusion text; update TestCEL_TLSPeerAutoTLS (and reuse validCluster,
ok/bad variables) so the new branch mirrors the existing autoTLS+secretRef
negative test but uses the CertManager field instead of SecretRef.
In `@charts/etcd-operator/crd-bases/etcd-operator.cozystack.io_etcdmembers.yaml`:
- Around line 1339-1345: Add a CEL validation to the tls schema to enforce
mutual exclusivity between peerAutoTLS and peerSecretRef: update the tls
property (where peerAutoTLS and peerSecretRef are defined) to include an
x-kubernetes-validations entry that fails when both are present/true,
referencing the symbols peerAutoTLS and peerSecretRef; also update the
peerAutoTLS description to clarify it’s for legacy/--peer-auto-tls use and
highlights the unauthenticated peer security trade-off.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4c9a5582-be3f-4638-a1f9-ed410799d6a1
📒 Files selected for processing (11)
api/v1alpha2/cel_validation_test.goapi/v1alpha2/etcdcluster_types.goapi/v1alpha2/etcdmember_types.goapi/v1alpha2/zz_generated.deepcopy.gocharts/etcd-operator/crd-bases/etcd-operator.cozystack.io_etcdclusters.yamlcharts/etcd-operator/crd-bases/etcd-operator.cozystack.io_etcdmembers.yamlcontrollers/etcdmember_controller.gocontrollers/etcdmember_controller_test.gocontrollers/helpers.gointernal/migrate/adopt.gointernal/migrate/adopt_test.go
Timofei Larkin (lllamnyp)
left a comment
There was a problem hiding this comment.
The capability is the right one — adopting a legacy --peer-auto-tls cluster needs replacement/scaled members to speak the same insecure peer mode, or they can't rejoin. Three changes before merge; the first is an API-shape consistency point, the other two are about making an insecure mode impossible to adopt unknowingly.
1. Don't expose this on the typed spec — make it a reserved annotation
spec.tls.peer.autoTLS adds a user-settable, first-class, insecure peer-TLS mode sitting right next to secretRef and certManager. Its only sanctioned use is legacy adoption (the field doc says so), yet nothing stops someone picking it for a brand-new cluster and getting an unauthenticated peer plane. "Migration-only" is enforced by a comment, not the API.
This is the same call we made for headlessServiceName / dataDirSubPath: migration-only knobs a user should never choose belong on a reserved annotation the operator interprets, not the typed spec. The mechanics differ here — peer-auto-tls is cluster-wide and must propagate to every member the operator creates (no shared CA ⇒ a replacement must also run --peer-auto-tls), so it doesn't self-wipe like those two. So the right home is a reserved cluster-level annotation, e.g. etcd-operator.cozystack.io/peer-auto-tls, that:
- the cluster controller reads (in
deriveMemberTLS, in place ofspec.tls.peer.autoTLS) and propagates to every member it builds; buildPodconsumes (the existing internal member-level signal is fine as operator-managed plumbing) to render--peer-auto-tls;- takes effect only when no explicit peer
secretRef/certManageris set, and is ignored/superseded otherwise (precedence has to live in code now, see below).
Then drop PeerAutoTLS from the public types and revert the CEL rule back to the two-way has(secretRef) != has(certManager) — auto-tls is no longer part of the typed union.
Acknowledged tradeoff (and it's the point): as an annotation you can't CEL-validate exclusivity or hard-prevent a user from setting it. That's acceptable — an undocumented reserved key is far less of a footgun than a discoverable typed autoTLS option, and it keeps the public TLS API to the two real, secure modes.
2. Document it in docs/migration.md
This PR ships an entire insecure peer mode plus migration carry-forward with no doc coverage (migration.md doesn't mention peer-auto-tls at all). Add a subsection under the TLS material covering:
- what
--peer-auto-tlsis — per-member self-signed certs, no shared CA, encrypted but not authenticated (any TLS-capable workload reaching:2380can peer); - that adoption detects and carries it forward (now via the reserved annotation) so replacement/scale keeps working — and why (no CA exists to do real mTLS, a plaintext-peer replacement could never join);
- that the only off-ramp to real mTLS is delete-and-recreate (or the careful manual rolling restart, given persistent storage — strict-mTLS and auto-tls members can't peer, so there's a brief no-quorum window at the cutover);
- that this is a distinct scenario from the explicit-mTLS SAN coverage issue already documented — don't conflate them.
3. Make the warning loud
The adopt.go warning is correct in content but renders identically to routine warning: lines in the plan and is easy to miss. For a security-posture downgrade, elevate it: a distinct marker (e.g. a SECURITY: / ⚠ callout), and re-surface it in the post---apply summary, not only in the pre-apply plan — so an operator cannot finish a migration without seeing that they just adopted an unauthenticated peer plane. (Update its spec.tls.peer.autoTLS references to the new annotation while you're there.)
Address review feedback (PR #330): an unauthenticated peer plane must not be a discoverable, CEL-blessed option for new clusters. - Drop spec.tls.peer.autoTLS (and the PeerAutoTLS type); revert the PeerTLS CEL rule to the two-way has(secretRef) != has(certManager) union. - Carry legacy --peer-auto-tls forward on a cluster-level reserved annotation etcd-operator.cozystack.io/peer-auto-tls instead. deriveMemberTLS reads it (superseded by an explicit peer secretRef/certManager) and propagates it to every member it builds; clusterPeerScheme treats it as https. - etcd-migrate stamps the annotation and raises it as a SecurityWarning, rendered with a loud marker in the plan AND re-surfaced in the post-apply summary so an unauthenticated peer plane can't be adopted unnoticed. - Document the mode, carry-forward and off-ramp in docs/migration.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
internal/migrate/adopt.go (1)
201-215:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winReject mixed peer URL schemes before inferring cluster-wide auto-TLS.
The past review comment is still valid. This code unconditionally sets
AnnPeerAutoTLS="true"after seeing the firsthttps://peer URL, but does not verify that ALL members use the same scheme. If the live cluster has mixed schemes (somehttp://, somehttps://), the annotation forces every future replacement or scale-out member intohttps + --peer-auto-tlsmode (viaderiveAdoptedMemberTLS→buildPod), which cannot interoperate with thehttp://peers that were also adopted.Scan all members first, detect mixed schemes, and block adoption with a clear error when schemes are inconsistent.
🛡️ Suggested guard (from past review)
peerTLSDeclared := cluster.Spec.TLS != nil && cluster.Spec.TLS.Peer != nil if !peerTLSDeclared { + sawHTTP := false + sawHTTPS := false for _, m := range members { - if strings.HasPrefix(m.PeerURL, "https://") { - if cluster.Annotations == nil { - cluster.Annotations = map[string]string{} - } - cluster.Annotations[controllers.AnnPeerAutoTLS] = "true" - plan.SecurityWarnings = append(plan.SecurityWarnings, fmt.Sprintf( - "cluster runs etcd --peer-auto-tls (member %q advertises %s; no peerSecret in the legacy spec): carried forward via the reserved %s annotation so members keep interoperating across replacement/scale. "+ - "The peer plane is encrypted but NOT authenticated (no shared CA) — any TLS-capable workload that reaches :2380 can peer. Move to real mTLS (spec.tls.peer.secretRef or certManager) when you can; that is a delete-and-recreate since spec.tls is immutable.", - m.Name, m.PeerURL, controllers.AnnPeerAutoTLS)) - break - } + switch { + case strings.HasPrefix(m.PeerURL, "https://"): + sawHTTPS = true + case strings.HasPrefix(m.PeerURL, "http://"): + sawHTTP = true + } + } + if sawHTTP && sawHTTPS { + plan.Errors = append(plan.Errors, + "legacy cluster advertises mixed peer URL schemes; adoption only supports a single cluster-wide peer mode") + } else if sawHTTPS { + if cluster.Annotations == nil { + cluster.Annotations = map[string]string{} + } + cluster.Annotations[controllers.AnnPeerAutoTLS] = "true" + plan.SecurityWarnings = append(plan.SecurityWarnings, "...") } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/migrate/adopt.go` around lines 201 - 215, The code currently sets controllers.AnnPeerAutoTLS="true" after seeing the first member with PeerURL starting "https://" (in the peerTLSDeclared check over members), which can force --peer-auto-tls for future pods that cannot interoperate with any existing http:// peers; instead, scan all members first to detect their schemes (iterate members and record whether any PeerURL starts with "https://" and whether any starts with "http://"), and if both schemes are present append a clear adoption-blocking error to plan (or return an error) indicating mixed peer URL schemes and refuse to set the annotation; only when all discovered members consistently use https should you set cluster.Annotations[controllers.AnnPeerAutoTLS]="true" and add the security warning (the involved code paths include deriveAdoptedMemberTLS and buildPod which will assume auto-TLS).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@internal/migrate/adopt.go`:
- Around line 201-215: The code currently sets controllers.AnnPeerAutoTLS="true"
after seeing the first member with PeerURL starting "https://" (in the
peerTLSDeclared check over members), which can force --peer-auto-tls for future
pods that cannot interoperate with any existing http:// peers; instead, scan all
members first to detect their schemes (iterate members and record whether any
PeerURL starts with "https://" and whether any starts with "http://"), and if
both schemes are present append a clear adoption-blocking error to plan (or
return an error) indicating mixed peer URL schemes and refuse to set the
annotation; only when all discovered members consistently use https should you
set cluster.Annotations[controllers.AnnPeerAutoTLS]="true" and add the security
warning (the involved code paths include deriveAdoptedMemberTLS and buildPod
which will assume auto-TLS).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3e080970-618e-4837-a5f6-63c7bb0755bb
📒 Files selected for processing (12)
api/v1alpha2/cel_validation_test.goapi/v1alpha2/etcdmember_types.gocharts/etcd-operator/crd-bases/etcd-operator.cozystack.io_etcdmembers.yamlcmd/etcd-migrate/main.gocmd/etcd-migrate/output.gocontrollers/etcdmember_controller.gocontrollers/etcdmember_controller_test.gocontrollers/helpers.godocs/migration.mdinternal/migrate/adopt.gointernal/migrate/adopt_test.gointernal/migrate/plan.go
💤 Files with no reviewable changes (1)
- api/v1alpha2/cel_validation_test.go
🚧 Files skipped from review as they are similar to previous changes (6)
- controllers/etcdmember_controller.go
- api/v1alpha2/etcdmember_types.go
- internal/migrate/adopt_test.go
- controllers/etcdmember_controller_test.go
- charts/etcd-operator/crd-bases/etcd-operator.cozystack.io_etcdmembers.yaml
- controllers/helpers.go
4dab563 to
f4afff4
Compare
Signed-off-by: Andrey Kolkov <androndo@gmail.com>
f4afff4 to
109bfcc
Compare
Timofei Larkin (lllamnyp)
left a comment
There was a problem hiding this comment.
All three change-request items are addressed, cleanly. Approving.
- Off the typed spec → reserved annotation.
spec.tls.peer.autoTLS/ thePeerAutoTLSstruct are gone from the public types (thePeerTLSCEL rule is back to the two-waysecretRef != certManager), replaced by the reserved cluster annotationetcd-operator.cozystack.io/peer-auto-tls. Precedence is enforced in code:clusterPeerAutoTLSreturns false whenever any explicitspec.tls.peeris set, so real mTLS always wins and the annotation only applies when there's no typed peer TLS. - Documented. New
docs/migration.md"Peer auto-TLS" section — what it is (per-member self-signed, no shared CA, encrypted-not-authenticated), that adoption carries it forward via the annotation to keep replacement/scale working and why, and that the off-ramp to real mTLS is delete-and-recreate. - Loud warning. A distinct
SecurityWarningschannel (separate from routineWarnings), rendered inline as⚠️ SECURITY:and re-surfaced after--applyviarenderSecuritySummary— so an unauthenticated-peer adoption can't be missed.
What I checked beyond that:
- Peer-URL scheme consistency (the main correctness risk): both
memberPeerSchemeandclusterPeerSchemereturnhttpsfor the auto-tls case, and the seed / pending / existing peer URLs all route throughclusterPeerScheme, so advertised URLs,--initial-clusterentries, and the listener inbuildPodagree. A mismatch here would prevent quorum; it's consistent. - No nil-deref in
buildPod— the newswitchtakes thePeerAutoTLSbranch beforepeerTLS, and thePeerSecretRefderef only runs when it's non-nil. - Mutual exclusion of
PeerAutoTLSvsPeerSecretRefholds by construction in bothderiveMemberTLSandderiveAdoptedMemberTLS. - Detection stamps the annotation only when no peer TLS is declared and a live member advertises
https://— matching the legacy operator's unconditional---peer-auto-tls-unless-PeerSecretbehavior. - Build, full test suite, and
make manifests generate(no drift) all pass.
One non-blocking, cosmetic nit: api/v1alpha2/cel_validation_test.go:480 loosened its assertion from the full CEL message to a prefix. The rule message in etcdcluster_types.go is unchanged and still carries the full string, so the original full-message match would still pass — this loosening is unrelated to the feature and weakens the test slightly for no gain. Worth reverting to the full-message assertion, but not a blocker.
LGTM.
Summary by CodeRabbit
New Features
Documentation
Bug Fixes
Tests