Add golang.py to mirror Go releases from go.dev / dl.google.com#201
Add golang.py to mirror Go releases from go.dev / dl.google.com#201yaoge123 wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a Python script to sync a local Golang mirror by fetching the file list from go.dev and downloading artifacts from dl.google.com, with cleanup of stale local files.
Changes:
- Fetch Go release file metadata from the go.dev download API.
- Download missing/out-of-date artifacts into a configured working directory.
- Remove local files not present in the current upstream file list.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import json | ||
| import os | ||
| import sys | ||
| import subprocess | ||
| import hashlib |
There was a problem hiding this comment.
hashlib is now used to verify each downloaded file (and to skip up-to-date local files when an expected sha is available). Streaming hash so memory stays bounded.
| ["curl", "-s", "-m", "60", API_URL], | ||
| capture_output=True, text=True, timeout=90 | ||
| ) |
There was a problem hiding this comment.
Now using curl --fail -L plus an explicit returncode != 0 check, with stderr surfaced when fetch_versions fails.
| def download_file(filename, url, filepath): | ||
| """Download a file if not present or size mismatch.""" | ||
| tmpfile = filepath + ".tmp" | ||
| try: | ||
| result = subprocess.run( | ||
| ["curl", "-s", "-L", "-m", "600", "-o", tmpfile, url], | ||
| timeout=630 | ||
| ) | ||
| if result.returncode == 0: | ||
| os.rename(tmpfile, filepath) | ||
| return True |
There was a problem hiding this comment.
Switched to curl -fsSL for both download_file and fetch_versions so HTTP 4xx/5xx are treated as failures instead of producing empty bodies.
| ["curl", "-sI", "-m", "10", url], | ||
| capture_output=True, text=True, timeout=15 | ||
| ) | ||
| for line in result.stdout.splitlines(): | ||
| if line.lower().startswith("content-length:"): | ||
| return int(line.split(":", 1)[1].strip()) |
There was a problem hiding this comment.
get_remote_size now uses curl -sIL so we follow redirects and only count the final response's Content-Length.
| filename = f["filename"] | ||
| sha256 = f.get("sha256", "") | ||
| files[filename] = { | ||
| "url": BASE_URL + filename, | ||
| "sha256": sha256, | ||
| } |
There was a problem hiding this comment.
After downloading we verify the file against the upstream sha256 from the go.dev API and remove the file on mismatch (so the next run will retry).
| for fname in os.listdir(WORKDIR): | ||
| fpath = os.path.join(WORKDIR, fname) | ||
| if os.path.isfile(fpath) and fname not in expected: | ||
| print(f"Removing stale file: {fname}") | ||
| os.remove(fpath) |
There was a problem hiding this comment.
Restricted stale cleanup to filenames matching Go's release naming convention (go1.*, getgo*) so unrelated files in WORKDIR are no longer touched.
| return files | ||
|
|
||
|
|
||
| def download_file(filename, url, filepath): |
There was a problem hiding this comment.
Removed; download_file no longer takes filename.
Mirror Go release tarballs and installers from go.dev. The script uses the official go.dev/dl/?mode=json&include=all listing as the index and downloads files from dl.google.com/go via curl with atomic .tmp + rename. Compared to the previously common rsync upstream rsync.mirrors.ustc.edu.cn::golang/ this is a direct from-origin sync that does not depend on another mirror. Used by NJU mirror's golang job (~646G, 8118 files).
Address review feedback: - Use curl --fail -L so HTTP errors and redirects to error pages are treated as failures instead of producing empty/partial output. - Check curl returncode for fetch_versions and download_file. - Verify freshly-downloaded files against the expected sha256 from the go.dev API; reuse that sha to skip up-to-date local files. - Stream sha256 over the file so memory stays bounded. - Restrict stale cleanup to filenames matching Go's release naming convention (go1.* and getgo*), avoiding removal of unrelated files that happen to live in WORKDIR. - Drop the unused filename parameter from download_file.
Summary
Add
golang.pyto mirror Go releases directly from upstream.Why
The previously common golang upstream
rsync://rsync.mirrors.ustc.edu.cn/golang/no longer exposes the rsync module. This script gives tunasync a self-contained way to mirror Go releases without depending on another mirror.How it works
https://go.dev/dl/?mode=json&include=allhttps://dl.google.com/go/<filename>with curl.tmp+os.replaceContent-LengthNo rsync, no apt-mirror; just curl + JSON.