Skip to content

fix: sanitize all invalid characters in checksum filenames#2886

Merged
vmaerten merged 1 commit into
go-task:mainfrom
s3onghyun:fix-checksum-filename-regex
Jun 21, 2026
Merged

fix: sanitize all invalid characters in checksum filenames#2886
vmaerten merged 1 commit into
go-task:mainfrom
s3onghyun:fix-checksum-filename-regex

Conversation

@s3onghyun

Copy link
Copy Markdown
Contributor

Description

normalizeFilename (in internal/fingerprint) turns a task name into the on-disk checksum/timestamp filename, replacing invalid characters with -. It uses:

var checksumFilenameRegexp = regexp.MustCompile("[^A-z0-9]")

A-z is the classic range bug: it spans ASCII 65 (A) to 122 (z), which includes the six characters between Z and a[ \\ ] ^ _ \``. Those are not sanitized. The intent (per the comment) was clearly [A-Za-z0-9]`.

Concrete impact: a task name containing one of these characters (notably \\, a path separator on Windows) leaks into the checksum/timestamp file path, which can corrupt the file location and break sources:/generates: up-to-date detection for that task.

Fix

[^A-z0-9][^A-Za-z0-9].

Test

Extended TestNormalizeFilename with cases for the leaked characters (\\, _, [, ], ^, ```). Fails before, passes after.

Note: for any existing task whose name contains one of these characters, the normalized filename changes once, so the first run after upgrade recomputes the checksum (treated as not-up-to-date) — expected and harmless.

{"foo/bar/baz", "foo-bar-baz"},
{"foo@bar/baz", "foo-bar-baz"},
{"foo1bar2baz3", "foo1bar2baz3"},
// Characters in the ASCII range between 'Z' (90) and 'a' (97) must

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this comment is necessary, but the checks are OK.

@trulede trulede left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change seems reasonable.

I would actually do this, but that is a matter of preference.

var timestampFilenameRegexp = regexp.MustCompile([^[:alnum:]])

normalizeFilename used the regexp [^A-z0-9], but A-z is the ASCII range
65-122, which includes the six characters [ \ ] ^ _ ` between Z and a.
Those were left unsanitized in the on-disk checksum/timestamp filename, so
a task name containing e.g. a backslash could corrupt the file path and
break up-to-date detection. Use [^A-Za-z0-9] as intended.
@s3onghyun s3onghyun force-pushed the fix-checksum-filename-regex branch from 037246c to 775b2e1 Compare June 21, 2026 12:23
@s3onghyun

Copy link
Copy Markdown
Contributor Author

Thanks for the review, @trulede! Adopted both:

  • Removed the explanatory comment in the test.
  • Switched the regexp to [^[:alnum:]] as you suggested — cleaner, and in Go's RE2 it's exactly equivalent to [^A-Za-z0-9] (ASCII alnum), so behavior is unchanged.

TestNormalizeFilename still passes.

@vmaerten vmaerten self-requested a review June 21, 2026 13:21

@vmaerten vmaerten left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@vmaerten vmaerten merged commit 91b9e42 into go-task:main Jun 21, 2026
15 checks passed
vmaerten added a commit that referenced this pull request Jun 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants