Add seafile-download.sh to mirror Seafile desktop client downloads#203
Add seafile-download.sh to mirror Seafile desktop client downloads#203yaoge123 wants to merge 2 commits into
Conversation
Mirror Seafile desktop client downloads from www.seafile.com/download. The official download page links directly to Aliyun OSS (seafile-downloads.oss-cn-shanghai.aliyuncs.com). This script: 1. fetches the page with wget, 2. parses out OSS download URLs with Python's html.parser, 3. atomically downloads new/changed files via wget, 4. deletes stale local files bounded by TUNASYNC_MAX_DELETE. wget is used throughout because, in the tunasync Docker bridge network on at least one site, curl fails to reach Seafile's AWS-hosted OSS IPs while wget succeeds.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a seafile-download.sh helper to mirror Seafile client downloads by scraping the official download page, downloading files via wget, and pruning stale local artifacts.
Changes:
- Introduces a Bash script that fetches Seafile’s download page and extracts OSS links using an embedded Python parser.
- Downloads files atomically and skips unchanged downloads via a remote size check.
- Removes local files no longer present upstream, bounded by
TUNASYNC_MAX_DELETE.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1,122 @@ | |||
| #!/bin/bash | |||
There was a problem hiding this comment.
Confirmed: the shebang is at byte 0 (verified with xxd). The PR diff just renders it indented; actual file is correct.
| name = url.rstrip("/").rsplit("/", 1)[-1] | ||
| name = urllib.parse.unquote(name) | ||
| target = os.path.join(WORKDIR, name) | ||
| tmp = target + ".tmp" |
There was a problem hiding this comment.
Now sanitizing each OSS URL basename via urllib.parse.unquote and rejecting filenames containing path separators, NUL, '..', or empty strings before joining with WORKDIR.
| cd "$WORKDIR" | ||
|
|
||
| echo "Fetching download page via wget..." | ||
| wget -qO /tmp/seafile-page.html --timeout=30 --tries=3 "$UPSTREAM" || { |
There was a problem hiding this comment.
Replaced /tmp/seafile-page.html with mktemp -t seafile-page.XXXXXX.html plus a trap cleanup, so symlink/clobber races on shared hosts no longer apply.
| # Delete stale local files not in the current remote set | ||
| local_files = [f for f in os.listdir(WORKDIR) if os.path.isfile(os.path.join(WORKDIR, f))] | ||
| stale = [f for f in local_files if f not in remote_names] | ||
| if len(stale) > MAX_DELETE: | ||
| print(f"WARNING: {len(stale)} stale files exceeds MAX_DELETE ({MAX_DELETE})", file=sys.stderr) | ||
| sys.exit(1) | ||
| for f in stale: | ||
| fp = os.path.join(WORKDIR, f) | ||
| print(f"Deleting stale: {f}", file=sys.stderr) | ||
| os.remove(fp) |
There was a problem hiding this comment.
Stale-file deletion is now deferred until after every download succeeds; a transient upstream/network error can no longer wipe the mirror.
| # Check remote size with wget --spider | ||
| r = subprocess.run( | ||
| ["wget", "--spider", "--timeout=30", "--tries=1", "-S", url], | ||
| capture_output=True, text=True) | ||
| remote_size = 0 | ||
| for line in r.stderr.split("\n"): | ||
| if "Content-Length:" in line: | ||
| remote_size = int(line.split(":")[1].strip()) | ||
| break | ||
| if r.returncode != 0: | ||
| print(f"ERROR: spider {url}: {r.stderr[-200:]}", file=sys.stderr) | ||
| sys.exit(1) | ||
|
|
||
| if os.path.exists(target) and os.path.getsize(target) == remote_size: | ||
| continue |
There was a problem hiding this comment.
Now falls through to the actual download when Content-Length is absent (chunked / 302 / some CDNs) and verifies the downloaded temp file size matches Content-Length when it is present, rather than treating size 0 as a match.
Address review feedback: - Move stale-file deletion after all downloads succeed so a transient network or upstream error cannot wipe the mirror. - Sanitize the OSS URL basename via urllib.parse.unquote and reject filenames that contain path separators, NUL, '..' or are empty, preventing path traversal even if the upstream HTML is hostile. - Replace the fixed /tmp/seafile-page.html path with mktemp + trap cleanup to avoid symlink/clobber races on shared hosts. - When Content-Length is missing (chunked transfer / some CDNs), fall back to downloading and verify the temp file size before promoting it, instead of treating size 0 as a match. - Confirmed shebang is at byte 0.
Summary
Add
seafile-download.shto mirror Seafile desktop client downloads fromhttps://www.seafile.com/download/.Why
The official Seafile download page links directly to Aliyun OSS (
seafile-downloads.oss-cn-shanghai.aliyuncs.com). There is no rsync, ftp, or directory listing.How it works
wgethtml.parserwget(.tmp+os.replace)Content-LengthTUNASYNC_MAX_DELETE(default 50)seafile-serverURLs (out of scope)Note on wget vs curl
wgetis used throughout instead ofcurlbecause, in at least one tunasync Docker bridge network,curlfails to reach Seafile's AWS-hosted OSS IPs whilewgetsucceeds. The choice is deliberate and not just a stylistic preference.