route bake targets across pods with random loadbalance#3861
route bake targets across pods with random loadbalance#3861areebahmeddd wants to merge 1 commit into
Conversation
What about loadbalance=sticky ? |
|
Also, please squash the commits |
no change. it reuses the same shared connection path
sure 🙂 |
fb5e0c0 to
1050ee5
Compare
|
gentle ping! @AkihiroSuda the ci failure is not related to our changes. |
crazy-max
left a comment
There was a problem hiding this comment.
I was looking at
buildx/driver/kubernetes/podchooser/podchooser.go
Lines 29 to 45 in 0b358c7
Seems to still looks capable of picking the same pod for all targets in one bake no? ChoosePod() creates a new RNG on every call and seeds it with time.Now().Unix(). With this PR, multiple targets can create fresh clients and dial within the same second, so each call can get the same seed and produce the same first random value iiuc.
Maybe we should seed the RNG once at the package level, with concurrency handled, or use another safe random selection path? Otherwise this PR may still reproduce the original issue.
| func (d *DriverHandle) NewClient(ctx context.Context) (*client.Client, error) { | ||
| return d.Driver.Client(ctx, d.getClientOptions()...) | ||
| } | ||
|
|
||
| func (d *DriverHandle) NeedsNewClientPerBuild() bool { | ||
| type perBuildClientDriver interface { | ||
| NeedsNewClientPerBuild() bool | ||
| } | ||
| if p, ok := d.Driver.(perBuildClientDriver); ok { | ||
| return p.NeedsNewClientPerBuild() | ||
| } | ||
| return false | ||
| } |
There was a problem hiding this comment.
Actually after new review this doesn't feel like the right shape for the driver API. NeedsNewClientPerBuild() is added as an optional hidden interface inside DriverHandle, but the behavior is broader than the name says. The build code also uses it to disable shared local mount sessions.
Can we make that contract explicit near the Driver interface instead of hiding it in manager.go? Something like an optional RequiresUncachedClient() interface would make the build-side behavior easier to reason about. I would also rename NewClient() to something like UncachedClient() so it is clear this is deliberately bypassing DriverHandle.Client() caching, not just another constructor.
There was a problem hiding this comment.
Maybe we should seed the RNG once at the package level, with concurrency handled, or use another safe random selection path? Otherwise this PR may still reproduce the original issue.
yea, re-seeding in ChoosePod() could lead to identical selections for concurrent dials. i'll switch it to use a shared RandSource that's seeded once and reused
Can we make that contract explicit near the Driver interface instead of hiding it in manager.go? Something like an optional RequiresUncachedClient() interface would make the build-side behavior easier to reason about. I would also rename NewClient() to something like UncachedClient() so it is clear this is deliberately bypassing DriverHandle.Client() caching, not just another constructor.
okay, i've moved it next to the Driver interface as an optional RequiresUncachedClient() interface
|
Hm, let me take a look |
1050ee5 to
61d1cb4
Compare
With loadbalance=random, all bake targets were landing on the same pod. The client connection was cached after the first dial, so pod selection only happened once. Fix by dialing a fresh connection per target when loadbalance=random. Also skip the shared local-mount session optimisation in this mode since targets connect to different pods and cannot share a session. Fixes docker#3382 Signed-off-by: Areeb Ahmed <areebahmed0709@gmail.com>
61d1cb4 to
1a82b23
Compare
loadbalance=randomon the kubernetes driver was sending all bake targets to the same pod.the client connection was cached after the first dial, so pod selection only happened once.
fix this by creating a new connection per target when
loadbalance=randomis enabled.also disable the shared local mount session optimization in this mode.
Closes #3382