Update compatibility matrix files#2400
Conversation
ac77130 to
dc9197c
Compare
pkg/util/version/version.go
Outdated
|
|
||
| func NewVersion(s string) (version *Version, err error) { | ||
| // TODO[md]: handle prerelease versions (0.1.2-rc1) so they aren't appended to the previous component | ||
| // todo[md]: tbh just switch to hashicorp/go-version or github.com/Masterminds/semver/v3 |
There was a problem hiding this comment.
💯 ive used semver/v3 in the past, its pretty nice
There was a problem hiding this comment.
I always go to that one too, but I found it the other day it doesn't support "invalid" semver input like ubuntu's "22.04" with leading zeros. I was hoping to use the same version code in cog and the new base image generator code 😒
markphelps
left a comment
There was a problem hiding this comment.
one question, but overall lgtm
| if len(parts) != 4 { | ||
| return nil, fmt.Errorf("Tag must be in the format <cudaVersion>-cudnn<cudnnVersion>-{devel,runtime}-ubuntu<ubuntuVersion>. Invalid tag: %s", tag) | ||
| func parseCUDABaseImage(ctx context.Context, tag string) (*config.CUDABaseImage, error) { | ||
| fmt.Println("parsing", tag) |
There was a problem hiding this comment.
debug printlns ? / do we want to keep these ?
tools/compatgen/internal/cuda.go
Outdated
| images := make([]config.CUDABaseImage, len(tags)) | ||
| eg, egctx := errgroup.WithContext(context.TODO()) | ||
| // set a concurrency limit to avoid throttling by the docker hub api (since these are authenticated requests) | ||
| eg.SetLimit(1) |
There was a problem hiding this comment.
why use error group at all then if we are running them serially?
There was a problem hiding this comment.
natural evolution of intermittent issues and sloppy code. fixing :)
|
@markphelps I removed the unnecessary errgroup and excessive print statements. I left one for each image since the process takes a few minutes and it's nice to see some output to know it's not hanging |
Newer nvidia/cuda image tags no longer include the CuDNN version number (e.g. '12.9.1-cudnn-devel-ubuntu24.04' instead of '12.6.3-cudnn9-devel-ubuntu22.04'). This fetches the Docker image config and reads CUDA_VERSION and NV_CUDNN_VERSION from environment variables instead of parsing the tag string. Also adds deterministic sorting of output and Docker Hub auth to avoid rate limits.
eab374b to
9dd7959
Compare
…hon versions - Regenerated cuda_compatibility.json: adds CUDA 13.0.x and 13.1.x, CuDNN versions now correctly extracted from image configs instead of manual patches - Regenerated torch_compatibility.json: adds torch 2.10.0, 2.9.x; drops 41 entries for torch versions with no supported Python versions (all < 3.10) - Updated tests to use torch versions still in the compatibility matrix
60af790 to
f70c096
Compare
Summary
Fix
compatgentool to correctly extract CuDNN versions from newer nvidia/cuda images, and regenerate all compatibility matrices.Problem
Newer nvidia/cuda image tags no longer include the CuDNN version number (e.g.
12.9.1-cudnn-devel-ubuntu24.04instead of12.6.3-cudnn9-devel-ubuntu22.04). The tag parser incompatgencouldn't extract CuDNN from these tags, so new CUDA versions had to be manually patched into the JSON (see #2036).Fix
tools/compatgen/internal/cuda.go: Fetch the Docker image config viago-containerregistryand readNV_CUDNN_VERSIONfrom environment variables instead of parsing the tag string. Usesauthn.DefaultKeychainfor Docker Hub auth to avoid rate limits. Adds deterministic sorting of output.tools/compatgen/internal/torch.go: Filter out torch entries with no supported Python versions (all below 3.10 minimum) instead of emitting them with"Pythons": [].tools/compatgen/main.go: Passcontext.Contextto support the image fetching.Regenerated files
cuda_compatibility.json: Adds CUDA 13.0.x and 13.1.x. CuDNN values now correctly extracted from image configs.torch_compatibility.json: Adds torch 2.10.0, 2.9.x. Drops 41 entries for torch versions with no supported Python versions.tf_compatibility.json: Adds TensorFlow 2.20.0.Test updates
Updated 4 tests that referenced dropped torch versions (1.7.x, 1.8.0) to use versions still in the matrix (1.11.0, 1.13.1, 2.0.1).