Skip to content

Add seafile-download.sh to mirror Seafile desktop client downloads#203

Open
yaoge123 wants to merge 2 commits into
tuna:masterfrom
yaoge123:add-seafile-download-sh
Open

Add seafile-download.sh to mirror Seafile desktop client downloads#203
yaoge123 wants to merge 2 commits into
tuna:masterfrom
yaoge123:add-seafile-download-sh

Conversation

@yaoge123
Copy link
Copy Markdown
Contributor

@yaoge123 yaoge123 commented May 24, 2026

Summary

Add seafile-download.sh to mirror Seafile desktop client downloads from https://www.seafile.com/download/.

Why

The official Seafile download page links directly to Aliyun OSS (seafile-downloads.oss-cn-shanghai.aliyuncs.com). There is no rsync, ftp, or directory listing.

How it works

  1. Fetch the download page via wget
  2. Parse out OSS URLs with Python's html.parser
  3. Atomically download new/changed files via wget (.tmp + os.replace)
  4. Skip files whose local size matches Content-Length
  5. Delete stale local files bounded by TUNASYNC_MAX_DELETE (default 50)
  6. Skip seafile-server URLs (out of scope)

Note on wget vs curl

wget is used throughout instead of curl because, in at least one tunasync Docker bridge network, curl fails to reach Seafile's AWS-hosted OSS IPs while wget succeeds. The choice is deliberate and not just a stylistic preference.

Mirror Seafile desktop client downloads from www.seafile.com/download.

The official download page links directly to Aliyun OSS
(seafile-downloads.oss-cn-shanghai.aliyuncs.com). This script:
  1. fetches the page with wget,
  2. parses out OSS download URLs with Python's html.parser,
  3. atomically downloads new/changed files via wget,
  4. deletes stale local files bounded by TUNASYNC_MAX_DELETE.

wget is used throughout because, in the tunasync Docker bridge network
on at least one site, curl fails to reach Seafile's AWS-hosted OSS IPs
while wget succeeds.
Copilot AI review requested due to automatic review settings May 24, 2026 09:20
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a seafile-download.sh helper to mirror Seafile client downloads by scraping the official download page, downloading files via wget, and pruning stale local artifacts.

Changes:

  • Introduces a Bash script that fetches Seafile’s download page and extracts OSS links using an embedded Python parser.
  • Downloads files atomically and skips unchanged downloads via a remote size check.
  • Removes local files no longer present upstream, bounded by TUNASYNC_MAX_DELETE.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread seafile-download.sh
@@ -0,0 +1,122 @@
#!/bin/bash
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed: the shebang is at byte 0 (verified with xxd). The PR diff just renders it indented; actual file is correct.

Comment thread seafile-download.sh Outdated
Comment on lines +88 to +91
name = url.rstrip("/").rsplit("/", 1)[-1]
name = urllib.parse.unquote(name)
target = os.path.join(WORKDIR, name)
tmp = target + ".tmp"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now sanitizing each OSS URL basename via urllib.parse.unquote and rejecting filenames containing path separators, NUL, '..', or empty strings before joining with WORKDIR.

Comment thread seafile-download.sh Outdated
cd "$WORKDIR"

echo "Fetching download page via wget..."
wget -qO /tmp/seafile-page.html --timeout=30 --tries=3 "$UPSTREAM" || {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced /tmp/seafile-page.html with mktemp -t seafile-page.XXXXXX.html plus a trap cleanup, so symlink/clobber races on shared hosts no longer apply.

Comment thread seafile-download.sh Outdated
Comment on lines +74 to +83
# Delete stale local files not in the current remote set
local_files = [f for f in os.listdir(WORKDIR) if os.path.isfile(os.path.join(WORKDIR, f))]
stale = [f for f in local_files if f not in remote_names]
if len(stale) > MAX_DELETE:
print(f"WARNING: {len(stale)} stale files exceeds MAX_DELETE ({MAX_DELETE})", file=sys.stderr)
sys.exit(1)
for f in stale:
fp = os.path.join(WORKDIR, f)
print(f"Deleting stale: {f}", file=sys.stderr)
os.remove(fp)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale-file deletion is now deferred until after every download succeeds; a transient upstream/network error can no longer wipe the mirror.

Comment thread seafile-download.sh Outdated
Comment on lines +93 to +107
# Check remote size with wget --spider
r = subprocess.run(
["wget", "--spider", "--timeout=30", "--tries=1", "-S", url],
capture_output=True, text=True)
remote_size = 0
for line in r.stderr.split("\n"):
if "Content-Length:" in line:
remote_size = int(line.split(":")[1].strip())
break
if r.returncode != 0:
print(f"ERROR: spider {url}: {r.stderr[-200:]}", file=sys.stderr)
sys.exit(1)

if os.path.exists(target) and os.path.getsize(target) == remote_size:
continue
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now falls through to the actual download when Content-Length is absent (chunked / 302 / some CDNs) and verifies the downloaded temp file size matches Content-Length when it is present, rather than treating size 0 as a match.

Address review feedback:
- Move stale-file deletion after all downloads succeed so a transient
  network or upstream error cannot wipe the mirror.
- Sanitize the OSS URL basename via urllib.parse.unquote and reject
  filenames that contain path separators, NUL, '..' or are empty,
  preventing path traversal even if the upstream HTML is hostile.
- Replace the fixed /tmp/seafile-page.html path with mktemp + trap
  cleanup to avoid symlink/clobber races on shared hosts.
- When Content-Length is missing (chunked transfer / some CDNs), fall
  back to downloading and verify the temp file size before promoting
  it, instead of treating size 0 as a match.
- Confirmed shebang is at byte 0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants