Update compatibility matrix files by michaeldwan · Pull Request #2400 · replicate/cog

michaeldwan · 2025-06-09T20:54:27Z

Summary

Fix compatgen tool to correctly extract CuDNN versions from newer nvidia/cuda images, and regenerate all compatibility matrices.

Problem

Newer nvidia/cuda image tags no longer include the CuDNN version number (e.g. 12.9.1-cudnn-devel-ubuntu24.04 instead of 12.6.3-cudnn9-devel-ubuntu22.04). The tag parser in compatgen couldn't extract CuDNN from these tags, so new CUDA versions had to be manually patched into the JSON (see #2036).

Fix

tools/compatgen/internal/cuda.go: Fetch the Docker image config via go-containerregistry and read NV_CUDNN_VERSION from environment variables instead of parsing the tag string. Uses authn.DefaultKeychain for Docker Hub auth to avoid rate limits. Adds deterministic sorting of output.
tools/compatgen/internal/torch.go: Filter out torch entries with no supported Python versions (all below 3.10 minimum) instead of emitting them with "Pythons": [].
tools/compatgen/main.go: Pass context.Context to support the image fetching.

Regenerated files

cuda_compatibility.json: Adds CUDA 13.0.x and 13.1.x. CuDNN values now correctly extracted from image configs.
torch_compatibility.json: Adds torch 2.10.0, 2.9.x. Drops 41 entries for torch versions with no supported Python versions.
tf_compatibility.json: Adds TensorFlow 2.20.0.

Test updates

Updated 4 tests that referenced dropped torch versions (1.7.x, 1.8.0) to use versions still in the matrix (1.11.0, 1.13.1, 2.0.1).

markphelps · 2025-07-08T15:21:26Z

pkg/util/version/version.go


 func NewVersion(s string) (version *Version, err error) {
+	// TODO[md]: handle prerelease versions (0.1.2-rc1) so they aren't appended to the previous component
+	// todo[md]: tbh just switch to hashicorp/go-version or github.com/Masterminds/semver/v3


💯 ive used semver/v3 in the past, its pretty nice

I always go to that one too, but I found it the other day it doesn't support "invalid" semver input like ubuntu's "22.04" with leading zeros. I was hoping to use the same version code in cog and the new base image generator code 😒

markphelps

one question, but overall lgtm

markphelps · 2025-07-08T15:22:29Z

tools/compatgen/internal/cuda.go

-	if len(parts) != 4 {
-		return nil, fmt.Errorf("Tag must be in the format <cudaVersion>-cudnn<cudnnVersion>-{devel,runtime}-ubuntu<ubuntuVersion>. Invalid tag: %s", tag)
+func parseCUDABaseImage(ctx context.Context, tag string) (*config.CUDABaseImage, error) {
+	fmt.Println("parsing", tag)


debug printlns ? / do we want to keep these ?

markphelps · 2025-07-08T15:24:29Z

tools/compatgen/internal/cuda.go

+	images := make([]config.CUDABaseImage, len(tags))
+	eg, egctx := errgroup.WithContext(context.TODO())
+	// set a concurrency limit to avoid throttling by the docker hub api (since these are authenticated requests)
+	eg.SetLimit(1)


why use error group at all then if we are running them serially?

natural evolution of intermittent issues and sloppy code. fixing :)

michaeldwan · 2025-07-08T15:46:04Z

@markphelps I removed the unnecessary errgroup and excessive print statements. I left one for each image since the process takes a few minutes and it's nice to see some output to know it's not hanging

markphelps

lgtm!

Newer nvidia/cuda image tags no longer include the CuDNN version number (e.g. '12.9.1-cudnn-devel-ubuntu24.04' instead of '12.6.3-cudnn9-devel-ubuntu22.04'). This fetches the Docker image config and reads CUDA_VERSION and NV_CUDNN_VERSION from environment variables instead of parsing the tag string. Also adds deterministic sorting of output and Docker Hub auth to avoid rate limits.

…hon versions - Regenerated cuda_compatibility.json: adds CUDA 13.0.x and 13.1.x, CuDNN versions now correctly extracted from image configs instead of manual patches - Regenerated torch_compatibility.json: adds torch 2.10.0, 2.9.x; drops 41 entries for torch versions with no supported Python versions (all < 3.10) - Updated tests to use torch versions still in the compatibility matrix

michaeldwan marked this pull request as ready for review June 9, 2025 21:11

michaeldwan requested a review from a team June 9, 2025 21:12

This was referenced Jun 9, 2025

Support Torch 2.7.0, 2.6.0; cuda 12.9, 12.8, 12.6.3 #2320

Closed

Use Python 3.13 by default #2099

Closed

michaeldwan requested a review from andreasjansson June 10, 2025 00:11

michaeldwan force-pushed the md/fix-compatgen branch 2 times, most recently from ac77130 to dc9197c Compare July 3, 2025 23:21

michaeldwan requested a review from markphelps July 8, 2025 15:16

markphelps reviewed Jul 8, 2025

View reviewed changes

markphelps approved these changes Jul 8, 2025

View reviewed changes

markphelps previously approved these changes Jul 8, 2025

View reviewed changes

michaeldwan mentioned this pull request Jul 8, 2025

Remove python 3.12 from older torch versions #2455

Merged

michaeldwan dismissed markphelps’s stale review via eab374b February 6, 2026 00:48

tempusfrangit added this to the 0.17.0 Release milestone Feb 17, 2026

This was referenced Feb 17, 2026

Support CUDA 13 / Torch 2.9 #2563

Open

Update torch_compatibility.json #2718

Closed

michaeldwan force-pushed the md/fix-compatgen branch from eab374b to 9dd7959 Compare February 18, 2026 22:35

michaeldwan force-pushed the md/fix-compatgen branch from 60af790 to f70c096 Compare February 18, 2026 23:32

tempusfrangit approved these changes Feb 19, 2026

View reviewed changes

tempusfrangit merged commit faa27ae into main Feb 19, 2026
31 checks passed

tempusfrangit deleted the md/fix-compatgen branch February 19, 2026 17:59

This was referenced Feb 19, 2026

Create compatibility matrices manually #820

Closed

Resolve Torch to maximum compatible version #1951

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update compatibility matrix files#2400

Update compatibility matrix files#2400
tempusfrangit merged 2 commits intomainfrom
md/fix-compatgen

michaeldwan commented Jun 9, 2025 •

edited

Loading

Uh oh!

markphelps Jul 8, 2025

Uh oh!

michaeldwan Jul 8, 2025

Uh oh!

markphelps left a comment

Uh oh!

markphelps Jul 8, 2025

Uh oh!

markphelps Jul 8, 2025

Uh oh!

michaeldwan Jul 8, 2025

Uh oh!

michaeldwan commented Jul 8, 2025

Uh oh!

markphelps left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

michaeldwan commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Fix

Regenerated files

Test updates

Uh oh!

markphelps Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

michaeldwan Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

markphelps left a comment

Choose a reason for hiding this comment

Uh oh!

markphelps Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

markphelps Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

michaeldwan Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

michaeldwan commented Jul 8, 2025

Uh oh!

markphelps left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

michaeldwan commented Jun 9, 2025 •

edited

Loading