Module
Kafka
Testcontainers version
1.21.4
Using the latest Testcontainers version?
Yes
Host OS
Linux
Host Arch
x86-64
Docker version
Client:
Version: 20.10.24+dfsg1
API version: 1.41
Go version: go1.19.8
Git commit: 297e128
Built: Sat Jan 3 00:46:39 2026
OS/Arch: linux/amd64
Context: default
Experimental: true
Server:
Engine:
Version: 20.10.24+dfsg1
API version: 1.41 (minimum version 1.12)
Go version: go1.19.8
Git commit: 5d6db84
Built: Sat Jan 3 00:46:39 2026
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.20~ds1
GitCommit: 1.6.20~ds1-1+deb12u2
runc:
Version: 1.1.5+ds1
GitCommit: 1.1.5+ds1-1+deb12u1
docker-init:
Version: 0.19.0
GitCommit:
What happened?
During some CI test runs Kafka container startup fails with Wait strategy failed. Container exited with code 126. The tests are run in parallel, on a relatively small virtual machine, the machine can experience high load during the tests.
The cause seem to be a race-like condition between startup script installation and waiting for the script in container command. The command is set to while [ ! -f /tmp/testcontainers_start.sh ]; do sleep 0.1; done; /tmp/testcontainers_start.sh so the condition only checks if the file exists and it's possible that the command will try to execute the script before it's fully written (or maybe also before it has permissions set correctly - although I'm unsure whether this can happen as I didn't manage to reproduce such exact scenario) resulting in 126 error.
This can be reproduced by removing sleep 0.1 from the loop. When container is created like this:
KafkaContainer kafkaContainer = new KafkaContainer("apache/kafka:3.8.0")
.withCommand("sh", "-c", "while [ ! -f /tmp/testcontainers_start.sh ]; do true; done; /tmp/testcontainers_start.sh");
container startup repeatedly fails with:
2026-04-10 21:48:06,243 [main] ERROR tc.apache/kafka:3.8.0 - Could not start container
java.lang.IllegalStateException: Wait strategy failed. Container exited with code 126
at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:525)
at org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:346)
at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:81)
at org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:336)
at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:322)
(...)
Caused by: org.testcontainers.containers.ContainerLaunchException: Timed out waiting for log output matching '.*Transitioning from RECOVERY to RUNNING.*'
at org.testcontainers.containers.wait.strategy.LogMessageWaitStrategy.waitUntilReady(LogMessageWaitStrategy.java:47)
at org.testcontainers.containers.wait.strategy.AbstractWaitStrategy.waitUntilReady(AbstractWaitStrategy.java:52)
at org.testcontainers.containers.GenericContainer.waitUntilContainerStarted(GenericContainer.java:909)
at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:492)
... 58 common frames omitted
2026-04-10 21:48:06,254 [main] ERROR tc.apache/kafka:3.8.0 - Log output from the failed container:
sh: /tmp/testcontainers_start.sh: Text file busy
and it seem likely then on busy hosts the sleep 0.1 version can get the same error from time to time.
An easy fix is to run a separate 'notify command' step after the script is fully copied. With following changes:
private final class MyKafkaContainer extends KafkaContainer {
public FixedKafkaContainer(String imageName) {
super(imageName);
withCommand("sh", "-c", "while [ ! -f /tmp/startup-done ]; do true; done; /tmp/testcontainers_start.sh");
}
@Override
protected void containerIsStarting(InspectContainerResponse containerInfo) {
super.containerIsStarting(containerInfo); // copy the script
try {
execInContainer("sh", "-c", "touch /tmp/startup-done"); // notify command *after* the copy has finished
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
the container starts repeatedly without any errors.
If you think that this is a good solution I'd be more than happy to create a PR.
Relevant log output
Additional Information
No response
Module
Kafka
Testcontainers version
1.21.4
Using the latest Testcontainers version?
Yes
Host OS
Linux
Host Arch
x86-64
Docker version
What happened?
During some CI test runs Kafka container startup fails with
Wait strategy failed. Container exited with code 126. The tests are run in parallel, on a relatively small virtual machine, the machine can experience high load during the tests.The cause seem to be a race-like condition between startup script installation and waiting for the script in container command. The command is set to
while [ ! -f /tmp/testcontainers_start.sh ]; do sleep 0.1; done; /tmp/testcontainers_start.shso the condition only checks if the file exists and it's possible that the command will try to execute the script before it's fully written (or maybe also before it has permissions set correctly - although I'm unsure whether this can happen as I didn't manage to reproduce such exact scenario) resulting in 126 error.This can be reproduced by removing
sleep 0.1from the loop. When container is created like this:container startup repeatedly fails with:
and it seem likely then on busy hosts the
sleep 0.1version can get the same error from time to time.An easy fix is to run a separate 'notify command' step after the script is fully copied. With following changes:
the container starts repeatedly without any errors.
If you think that this is a good solution I'd be more than happy to create a PR.
Relevant log output
Additional Information
No response