Skip to content

Conversation

@dave4420
Copy link

@dave4420 dave4420 commented Sep 8, 2025

Description

Instead of invoking editorconfig once, we use xargs to invoke editorconfig multiple time with batches of up to 1000 files. In the case where there are between 1 and 1000 files to be checked, we will continue to invoke editorconfig exactly once. In the case where there are no files to be checked, we will not invoke editorconfig at all; therefore we remove the /dev/null hack.

There are two consequences of using xargs.

First, we cannot run editorconfig via a shell function, as xargs cannot run shell functions. Instead we must run it via a script. I have taken the approach of having xargs call the same check-file-format.sh script that calls it, but with different flags. I am aware that some people strongly prefer having separate helper scripts for xargs to call. I am happy to rewrite this script in that fashion if that is your preference.

Second, instead of providing the list of files via the command line, we provide it via a pipe. This allows us to use the -0 flag of xargs and the -z flag of the git tooling to pass the filenames as NUL terminated instead of LF terminated, and prevents spaces in filenames from being interpreted as delimiters.

In the case where we run editorconfig via docker, we further need to shell-escape each filename so that filenames with spaces survive being parsed by the inner shell: hence the printf '%q ' nonsense.

My intent is to squash when I merge.

Context

The BCSS team is migrating its codebase from GitLab to GitHub. As part of that, we have added the contents of this template repository to our own repositories.

We encountered two bugs:

  • This script does not correctly handle filenames that contain spaces. While you could argue that filenames shouldn't contain spaces, our codebase has been worked on for two decades by mostly Windows-based developers, and has a number of files whose names contain spaces.
  • This script does not correctly handle checking against a large number of files. When run with check=all, this script invokes editorconfig (possibly via docker) with the names of every file in the repository on its command line. This means that with a large enough repository, we hit the limit for the size of a process's command line and environment combined (typically 2MB on Linux but only 256KB on macOS).

This PR fixes both bugs.

The same bugs are present in the check-english-usage and check-markdown-format scripts. I'm happy to extend this PR to fix both of those scripts, or to do so in separate PRs.

Type of changes

  • Refactoring (non-breaking change)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would change existing functionality)
  • Bug fix (non-breaking change which fixes an issue)

Checklist

  • I am familiar with the contributing guidelines
  • I have followed the code style of the project
  • I have added tests to cover my changes
  • I have updated the documentation accordingly
  • This PR is a result of pair or mob programming

Sensitive Information Declaration

To ensure the utmost confidentiality and protect your and others privacy, we kindly ask you to NOT including PII (Personal Identifiable Information) / PID (Personal Identifiable Data) or any other sensitive data in this PR (Pull Request) and the codebase changes. We will remove any PR that do contain any sensitive information. We really appreciate your cooperation in this matter.

  • I confirm that neither PII/PID nor sensitive data are included in this PR and the codebase changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant