-
Notifications
You must be signed in to change notification settings - Fork 105
Rebase onto Git for Windows v2.53.0-rc0.windows.1 #845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: vfs-2.53.0-rc0
Are you sure you want to change the base?
Conversation
This adds hard-coded call to GVFS.hooks.exe before and after each Git command runs. To make sure that this is only called on repositories cloned with GVFS, we test for the tell-tale .gvfs. 2021-10-30: Recent movement of find_hook() to hook.c required moving these changes out of run-command.c to hook.c. 2025-11-06: The `warn_on_auto_comment_char` hack is so ugly that it forces us to pile similarly ugly code on top because that hack _expects_ that the config has not been read when `cmd_commit()`, `cmd_revert()`, `cmd_cherry_pick()`, `cmd_merge()`, or `cmd_rebase()` set that flag. But with the `pre_command()` hook already run, that assumption is incorrect. Signed-off-by: Ben Peart <Ben.Peart@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Suggested by Ben Peart. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Verify that the core.hooksPath configuration is repsected by the pre-command hook. Original regression test was written by Alejandro Pauly. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Alejandro Pauly <alpauly@microsoft.com>
When using the sparse-checkout feature, the file might not be on disk because the skip-worktree bit is on. This used to be a bug in the (hence deleted) `recursive` strategy. Let's ensure that this bug does not resurface. Signed-off-by: Kevin Willford <kewillf@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The 'git worktree' command was marked as BLOCK_ON_GVFS_REPO because it does not interact well with the virtual filesystem of VFS for Git. When a Scalar clone uses the GVFS protocol, it enables the GVFS_BLOCK_COMMANDS flag, since commands like 'git gc' do not work well with the GVFS protocol. However, 'git worktree' works just fine with the GVFS protocol since it isn't doing anything special. It copies the sparse-checkout from the current worktree, so it does not have performance issues. This is a highly requested option. The solution is to stop using the BLOCK_ON_GVFS_REPO option and instead add a special-case check in cmd_worktree() specifically for a particular bit of the 'core_gvfs' global variable (loaded by very early config reading) that corresponds to the virtual filesystem. The bit that most closely resembled this behavior was non-obviously named, but does provide a signal that we are in a Scalar clone and not a VFS for Git clone. The error message is copied from git.c, so it will have the same output as before if a user runs this in a VFS for Git clone. Signed-off-by: Derrick Stolee <derrickstolee@github.com>
When using the sparse-checkout feature git should not write to the working directory for files with the skip-worktree bit on. With the skip-worktree bit on the file may or may not be in the working directory and if it is not we don't want or need to create it by calling checkout_entry. There are two callers of checkout_target. Both of which check that the file does not exist before calling checkout_target. load_current which make a call to lstat right before calling checkout_target and check_preimage which will only run checkout_taret it stat_ret is less than zero. It sets stat_ret to zero and only if !stat->cached will it lstat the file and set stat_ret to something other than zero. This patch checks if skip-worktree bit is on in checkout_target and just returns so that the entry doesn't not end up in the working directory. This is so that apply will not create a file in the working directory, then update the index but not keep the working directory up to date with the changes that happened in the index. Signed-off-by: Kevin Willford <kewillf@microsoft.com>
Signed-off-by: Kevin Willford <kewillf@microsoft.com>
As of 9e59b38 (object-file: emit corruption errors when detected, 2022-12-14), Git will loudly complain about corrupt objects. That is fine, as long as the idea isn't to re-download locally-corrupted objects. But that's exactly what we want to do in VFS for Git via the `read-object` hook, as per the `GitCorruptObjectTests` code added in microsoft/VFSForGit@2db0c030eb25 (New features: [...] - GVFS can now recover from corrupted git object files [...] , 2018-02-16). So let's support precisely that, and add a regression test that ensures that re-downloading corrupt objects via the `read-object` hook works. While at it, avoid the XOR operator to flip the bits, when we actually want to make sure that they are turned off: Use the AND-NOT operator for that purpose. Helped-by: Matthew John Cheetham <mjcheetham@outlook.com> Helped-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Add the ability to block built-in commands based on if the `core.gvfs` setting has the `GVFS_USE_VIRTUAL_FILESYSTEM` bit set. This allows us to selectively block commands that use the GVFS protocol, but don't use VFS for Git (for example repos cloned via `scalar clone` against Azure DevOps). Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Loosen the blocking of the `repack` command from all "GVFS repos" (those that have `core.gvfs` set) to only those that actually use the virtual file system (VFS for Git only). This allows for `repack` to be used in Scalar clones. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
String formatting can be a performance issue when there are hundreds of thousands of trees. Change to stop using the strbuf_addf and just add the strings or characters individually. There are a limited number of modes so added a switch for the known ones and a default case if something comes through that are not a known one for git. In one scenario regarding a huge worktree, this reduces the time required for a `git checkout <branch>` from 44 seconds to 38 seconds, i.e. it is a non-negligible performance improvement. Signed-off-by: Kevin Willford <kewillf@microsoft.com>
Loosen the blocking of the `fsck` command from all "GVFS repos" (those that have `core.gvfs` set) to only those that actually use the virtual file system (VFS for Git only). This allows for `fsck` to be used in Scalar clones. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
The following commands and options are not currently supported when working in a GVFS repo. Add code to detect and block these commands from executing. 1) fsck 2) gc 4) prune 5) repack 6) submodule 8) update-index --split-index 9) update-index --index-version (other than 4) 10) update-index --[no-]skip-worktree 11) worktree Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Loosen the blocking of the `prune` command from all "GVFS repos" (those that have `core.gvfs` set) to only those that actually use the virtual file system (VFS for Git only). This allows for `prune` to be used in Scalar clones. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
In earlier versions of `microsoft/git`, we found a user who had set `core.gvfs = false` in their global config. This should not have been necessary, but it also should not have caused a problem. However, it did. The reason was that `gvfs_load_config_value()` was called from `config.c` when reading config key/value pairs from all the config files. The local config should override the global config, and this is done by `config.c` reading the global config first then reading the local config. However, our logic only allowed writing the `core_gvfs` variable once. In v2.51.0, we had to adapt to upstream changes that changed way the `core.gvfs` config value is read, and the special handling is no longer necessary, yet we still want the test case that ensures that this bug does not experience a regression. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Replace the special casing of the `worktree` command being blocked on VFS-enabled repos with the new `BLOCK_ON_VFS_ENABLED` flag. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
On index load, clear/set the skip worktree bits based on the virtual file system data. Use virtual file system data to update skip-worktree bit in unpack-trees. Use virtual file system data to exclude files and folders not explicitly requested. Update 2022-04-05: disable the "present-despite-SKIP_WORKTREE" file removal behavior when 'core.virtualfilesystem' is enabled. Signed-off-by: Ben Peart <benpeart@microsoft.com>
Emit a warning message when the `gvfs.sharedCache` option is set that the `repack` command will not perform repacking on the shared cache. In the future we can teach `repack` to operate on the shared cache, at which point we can drop this commit. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
…x has been redirected Fixes #13 Some git commands spawn helpers and redirect the index to a different location. These include "difftool -d" and the sequencer (i.e. `git rebase -i`, `git cherry-pick` and `git revert`) and others. In those instances we don't want to update their temporary index with our virtualization data. Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
Add check to see if a directory is included in the virtualfilesystem before checking the directory hashmap. This allows a directory entry like foo/ to find all untracked files in subdirectories.
When our patches to support that hook were upstreamed, the hook's name was eliciting some reviewer suggestions, and it was renamed to `post-index-change`. These patches (with the new name) made it into v2.22.0. However, VFSforGit users may very well have checkouts with that hook installed under the original name. To support this, let's just introduce a hack where we look a bit more closely when we just failed to find the `post-index-change` hook, and allow any `post-indexchanged` hook to run instead (if it exists).
When using a virtual file system layer, the FSMonitor does not make sense. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When sparse-checkout is enabled, add the sparse-checkout percentage to the Trace2 data stream. This number was already computed and printed on the console in the "You are in a sparse checkout..." message. It would be helpful to log it too for performance monitoring. Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Teach STATUS to optionally serialize the results of a status computation to a file. Teach STATUS to optionally read an existing serialization file and simply print the results, rather than actually scanning. This is intended for immediate status results on extremely large repos and assumes the use of a service/daemon to maintain a fresh current status snapshot. 2021-10-30: packet_read() changed its prototype in ec9a37d (pkt-line.[ch]: remove unused packet_read_line_buf(), 2021-10-14). 2021-10-30: sscanf() now does an extra check that "%d" goes into an "int" and complains about "uint32_t". Replacing with "%u" fixes the compile-time error. 2021-10-30: string_list_init() was removed by abf897b (string-list.[ch]: remove string_list_init() compatibility function, 2021-09-28), so we need to initialize manually. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Teach status serialization to take an optional pathname on
the command line to direct that cache data be written there
rather than to stdout. When used this way, normal status
results will still be written to stdout.
When no path is given, only binary serialization data is
written to stdout.
Usage:
git status --serialize[=<path>]
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Teach status deserialize code to reject status cache when printing in porcelain V2 and there are unresolved conflicts in the cache file. A follow-on task might extend the cache format to include this additiona data. See code for longer explanation. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
…-and-fix-built-in-fsmonitor Fix the built-in FSMonitor, and run Scalar's Functional Tests as part of the automated builds
This is random stuff that probably all got upstream in the meantime.
Under certain circumstances, the `cmd` attribute is set to an `strdup()`ed value. This value needs to be released in the end! These circumstances can be observed easily in the Microsoft Git fork, where the `read-object` hook triggers that code path. Since other users assign a non-`strdup()`ed value, be careful to add _another_ attribute (called `to_free`) that can hold a reference to such a string that needs to be released once the sub process is done. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This fixes a leak that is not detected by Git's test suite (but by microsoft/git's). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This fixes a leak that is not detected by Git's own test suite (but by microsoft/git's, in the t9210-scalar.sh test). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Tests in t7900 assume the state of the `maintenance.strategy` config setting; set/unset by previous tests. Correct this by explictly unsetting and re-setting the config at the start of the tests. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
The --path-walk option in `git pack-objects` is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within `git push` specifically.
While this config does enable the path-walk feature, it does not lead to
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via `git repack -Ad` helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set `gc.auto=0` before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in `git
push`, this change enforces the spawned `git pack-objects` process to
use `--no-reuse-delta`.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU
resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
This commit also adds a test case that demonstrates that `git -c
pack.usePathWalk=true push` now avoids reusing deltas.
To do this, the test case constructs a pack with a horrendously
inefficient delta object, then verifies that the pack on the receiving
side of the `push` fails to have such an inefficient delta.
The test case would probably be a lot more readable if hex numbers were
used instead of octal numbers, but alas, `printf "\x<hex>"` is not
portable, only `printf "\<octal>"` is. For example, dash's built-in
`printf` function simply prints `\x` verbatim while bash's built-in
happily converts this construct to the corresponding byte.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Git v2.48.0 has become even more stringent about leaks. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Introduce a new maintenance task, `cache-local-objects`, that operates on Scalar or VFS for Git repositories with a per-volume, shared object cache (specified by `gvfs.sharedCache`) to migrate packfiles and loose objects from the repository object directory to the shared cache. Older versions of `microsoft/git` incorrectly placed packfiles in the repository object directory instead of the shared cache; this task will help clean up existing clones impacted by that issue. Migration of packfiles involves the following steps for each pack: 1. Hardlink (or copy): a. the .pack file b. the .keep file c. the .rev file 2. Move (or copy + delete) the .idx file 3. Delete/unlink: a. the .pack file b. the .keep file c. the .rev file Moving the index file after the others ensures the pack is not read from the new cache directory until all associated files (rev, keep) exist in the cache directory also. Moving loose objects operates as a move, or copy + delete. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.
While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.
In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.
I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).
In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.
As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:
1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.
2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.
There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).
The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.
One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.
Alternative options considered included:
* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.
* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.
Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
The microsoft/git fork includes pre- and post-command hooks, with the initial intention of using these for VFS for Git. In that environment, these are important hooks to avoid concurrent issues when the virtualization is incomplete. However, in the Office monorepo the post-command hook is used in a different way. A custom hook is used to update the sparse-checkout, if necessary. To avoid this hook from being incredibly slow on every Git command, this hook checks for the existence of a "sentinel file" that is written by a custom post-index-change hook and no-ops if that file does not exist. However, even this "no-op" is 200ms due to the use of two scripts (one simple script in .git/hooks/ does some environment checking and then calls a script from the working directory which actually contains the logic). Add a new config option, 'postCommand.strategy', that will allow for multiple possible strategies in the future. For now, the one we are adding is 'worktree-change' which states that we should write a sentinel file instead of running the 'post-index-change' hook and then skip the 'post-command' hook if the proper sentinel file doesn't exist. If it does exist, then delete it and run the hook. This behavior is _only_ triggered, however, if a part of the index changes that is within the sparse checkout; If only parts of the index change that are not even checked out on disk, the hook is still skipped. I originally planned to put this into the repo-settings, but this caused the repo settings to load in more cases than they did previously. When there is an invalid boolean config option, this causes failure in new places. This was caught by t3007. This behavior is tested in t0401-post-command-hook.sh. Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
This helps t0401 pass while under SANITIZE=leak. Signed-off-by: Derrick Stolee <stolee@gmail.com>
When the top-level Git process is an alias, it doesn't load much config and thus the postCommand.strategy config setting is not loaded properly. This leads to the post-command hook being run more frequently than we want, including an alias for 'git add' running the hook even when the worktree did not change. Similar state is not loaded by 'git version' or 'git typo'. Signed-off-by: Derrick Stolee <stolee@gmail.com>
Add the `cache-local-objects` maintenance task to the list of tasks run by the `scalar run` command. It's often easier for users to run the shorter `scalar run` command than the equivalent `git maintenance` command. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
When the post-command hook runs, we could be in several custom states: 1. The command is a regular builtin, but the repo setup is incomplete. (Example: "git version", "git rev-parse HEAD") 2. The command is a dashed command, but the top level process uses a space in its call. (Example: "git http-request") 3. The command is an alias that goes to a builtin. 4. The command has no arguments or is only helptext. Each of these cases leads to a place where we previously had not loaded the postCommand.strategy config and would execute the post-command hook without looking for a sentinel file. There are two fixes here: First, we use early config parsing so we can get config details without fully setting up the repository. This is how core.hooksPath works in these situations. Second, we need to make sure handle_hook_replacement() is called even when the repo's gitdir is NULL. This requires a small amount of care to say that a sentinel file cannot exist if the gitdir isn't set, as we would not have written one without initializing the gitdir. This gets all of the desired behaviors we want for this setting without doing anything extreme with how pre- and post-command hooks run otherwise. Signed-off-by: Derrick Stolee <stolee@gmail.com>
…windows#720) Introduce a new maintenance task, `cache-local-objects`, that operates on Scalar or VFS for Git repositories with a per-volume, shared object cache (specified by `gvfs.sharedCache`) to migrate packfiles and loose objects from the repository object directory to the shared cache. Older versions of `microsoft/git` incorrectly placed packfiles in the repository object directory instead of the shared cache; this task will help clean up existing clones impacted by that issue. Fixes git-for-windows#716
Add ability to run Git commands for Scalar by passing a struct strvec rather than having to use varargs. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
The microsoft/git fork includes pre- and post-command hooks, with the
initial intention of using these for VFS for Git. In that environment,
these are important hooks to avoid concurrent issues when the
virtualization is incomplete.
However, in the Office monorepo the post-command hook is used in a
different way. A custom hook is used to update the sparse-checkout, if
necessary. To avoid this hook from being incredibly slow on every Git
command, this hook checks for the existence of a "sentinel file" that is
written by a custom post-index-change hook and no-ops if that file does
not exist.
However, even this "no-op" is 200ms due to the use of two scripts (one
simple script in .git/hooks/ does some environment checking and then
calls a script from the working directory which actually contains the
logic).
Add a new config option, 'postCommand.strategy', that will allow for
multiple possible strategies in the future. For now, the one we are
adding is 'post-index-change' which states that we should write a
sentinel file instead of running the 'post-index-change' hook and then
skip the 'post-command' hook if the proper sentinel file doesn't exist.
(If it does exist, then delete it and run the hook.)
---
This fork contains changes specific to monorepo scenarios. If you are an
external contributor, then please detail your reason for submitting to
this fork:
* [ ] This is an early version of work already under review upstream.
* [ ] This change only applies to interactions with Azure DevOps and the
GVFS Protocol.
* [ ] This change only applies to the virtualization hook and VFS for
Git.
* [x] This change only applies to custom bits in the microsoft/git fork.
Add the `--ref-format` option to the `scalar clone` command. This will allow users to opt-in to creating a Scalar repository using alternative ref storage backends, such as reftable. Example: scalar clone --ref-format reftable $URL Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
This patch series has been long in the making, ever since Johannes Nicolai and myself spiked this in November/December 2020. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When we are installing a loose object, finalize_object_file() first checks to see if the contents match what already exists in a loose object file of the target name. However, this doesn't check if the target is valid, it assumes the target is valid. However, in the case of a power outage or something like that, the file may be corrupt (for example: all NUL bytes). That is a common occurrence when we are needing to install a loose object _again_: we don't think we have it already so any copy that exists is probably bogus. Use the flagged version with FOF_SKIP_COLLISION_CHECK to avoid these types of errors, as seen in GitHub issue microsoft#837. Signed-off-by: Derrick Stolee <stolee@gmail.com>
Users sometimes see transient network errors, but they are actually due to some other problem within the installation of a packfile. Observed resolutions include freeing up space on a full disk or deleting the shared object cache because something was broken due to a file corruption or power outage. This change only provides the advice to suggest those workarounds to help users help themselves. This is our first advice custom to the microsoft/git fork, so I have partitioned the key away from the others to avoid adjacent change conflicts (at least until upstream adds a new change at the end of the alphabetical list). We could consider providing a tool that does a more robust check of the shared object cache, but since 'git fsck' isn't safe to run as it may download missing objects, we do not have that ability at the moment. The good news is that it is safe to delete and rebuild the shared object cache as long as all local branches are pushed. The branches must be pushed because the local .git/objects/ directory is moved to the shared object cache in the 'cache-local-objects' maintenance task. Signed-off-by: Derrick Stolee <stolee@gmail.com>
Similar to a recent change to avoid the collision check for loose objects, do the same for prefetch packfiles. This should be more rare, but the same prefetch packfile could be downloaded from the same cache server so this isn't out of the range of possibility. Signed-off-by: Derrick Stolee <stolee@gmail.com>
…-for-windows#832) Add the `--ref-format` option to the `scalar clone` command. This will allow users to opt-in to creating a Scalar repository using alternative ref storage backends, such as reftable. Example: ```shell scalar clone --ref-format reftable $URL ```
…ed object cache (git-for-windows#840) There have been a number of customer-reported problems with errors of the form ``` error: inflate: data stream error (unknown compression method) error: unable to unpack a163b1302d4729ebdb0a12d3876ca5bca4e1a8c3 header error: files 'D:/.scalarCache/id_49b0c9f4-555f-4624-8157-a57e6df513b3/pack/tempPacks/t-20260106-014520-049919-0001.temp' and 'D:/.scalarCache/id_49b0c9f4-555f-4624-8157-a57e6df513b3/a1/63b1302d4729ebdb0a12d3876ca5bca4e1a8c3' differ in contents error: gvfs-helper error: 'could not install loose object 'D:/.scalarCache/id_49b0c9f4-555f-4624-8157-a57e6df513b3/a1/63b1302d4729ebdb0a12d3876ca5bca4e1a8c3': from GET a163b1302d4729ebdb0a12d3876ca5bca4e1a8c3' ``` or ``` Receiving packfile 1/1 with 1 objects (bytes received): 17367934, done. Receiving packfile 1/1 with 1 objects [retry 1/6] (bytes received): 17367934, done. Waiting to retry after network error (sec): 100% (8/8), done. Receiving packfile 1/1 with 1 objects [retry 2/6] (bytes received): 17367934, done. Waiting to retry after network error (sec): 100% (16/16), done. Receiving packfile 1/1 with 1 objects [retry 3/6] (bytes received): 17367934, done. ``` These are not actually due to network issues, but they look like it based on the stack that is doing retries. Instead, these actually have problems when installing the loose object or packfile into the shared object cache. The loose objects are hitting issues when installing and the target loose object is corrupt in some way, such as all NUL bytes because the disk wasn't flushed when the machine shut down. The error results because we are doing a collision check without confirming that the existing contents are valid. The packfiles may be hitting similar comparison cases, but it is less likely. We update these packfile installations to also skip the collision check. In both cases, if we have a transient network error, we add a new advice message that helps users with the two most common workarounds: 1. Your disk may be full. Make room. 2. Your shared object cache may be corrupt. Push all branches, delete it, and fetch to refill it. I make special note of when the shared object cache doesn't exist and point that it probably should so the whole repo is suspect at that point. * [X] This change only applies to interactions with Azure DevOps and the GVFS Protocol. Resolves git-for-windows#837.
|
This comment is unfortunately a little bit long. I tried to make it more readable via details and summary tags, but unfortunately those cannot really be nested with quoted areas, so it doesn't work, sadness.
This got quite a bit larger because a function that formerly had been file-local is now public. Meaning that its declaration also needs to be edited, this one in the header file.
It is unfortunately not visible in the context, but Upstream changed the surrounding text to enclose the terms in backticks. This now unfortunately looks a little strange. Basically we dropped the commit Range-diff
Here, a function was dropped via a fix up commit.
This came from a fixup commit. We'll skip over the workflows fixups that have been squashed into the original commits. Here comes that range-diff
This is actually in the mimalloc version 2.2.6; I managed to upstream this patch, just not into Git, but into mimalloc instead. "That's all, folks!" |
This PR rebases Microsoft Git patches onto Git for Windows v2.53.0-rc0.windows.1.
Previous base: vfs-2.52.0
Range-diff vs vfs-2.52.0
sizevariable is initializedlookup_commit()sane_istest()does not access array past endzinmallocz()strbuf_read()does NUL-terminate correctly@@ gvfs-helper-client.c (new) + if (!skip_prefix(line, "loose ", &v1_oid)) + BUG("update_loose_cache: invalid line '%s'", line); + -+ odb_loose_cache_add_new_oid(gh_client__chosen_odb, &oid); -+} -+ -+/* -+ * Update the packed-git list to include the newly created packfile. -+ */ -+static void gh_client__update_packed_git(const char *line) -+{ -+ struct strbuf path = STRBUF_INIT; -+ const char *v1_filename; -+ struct packed_git *p; -+ int is_local; -+ -+ if (!skip_prefix(line, "packfile ", &v1_filename)) -+ BUG("update_packed_git: invalid line '%s'", line); -+ -+ /* -+ * ODB[0] is the local .git/objects. All others are alternates. -+ */ -+ is_local = (gh_client__chosen_odb == the_repository->objects->sources); -+ -+ strbuf_addf(&path, "%s/pack/%s", -+ gh_client__chosen_odb->path, v1_filename); -+ strbuf_strip_suffix(&path, ".pack"); -+ strbuf_addstr(&path, ".idx"); -+ -+ p = add_packed_git(the_repository, path.buf, path.len, is_local); -+ if (p) -+ packfile_store_add_pack_also_to_mru(the_repository, p); -+ strbuf_release(&path); ++ odb_source_loose_cache_add_new_oid(gh_client__chosen_odb, &oid); +} + +/* @@ gvfs-helper-client.c (new) + } + + else if (starts_with(line, "packfile")) { -+ gh_client__update_packed_git(line); + ghc |= GHC__CREATED__PACKFILE; + *p_nr_packfile += 1; + } @@ gvfs-helper-client.c (new) + } + } + ++ if (ghc & GHC__CREATED__PACKFILE) ++ packfile_store_reprepare(the_repository->objects->packfiles); ++ + *p_ghc = ghc; + + return err; @@ odb.c: static int do_oid_object_info_extended(struct object_database *odb, const struct object_id *real = oid; int already_retried = 0; int tried_hook = 0; -- + int tried_gvfs_helper = 0; if (flags & OBJECT_INFO_LOOKUP_REPLACE) real = lookup_replace_object(odb->repo, oid); @@ odb.c: retry: - } + odb_prepare_alternates(odb); while (1) { + extern int core_use_gvfs_helper; -+ - if (find_pack_entry(odb->repo, real, &e)) - break; + struct odb_source *source; + if (!packfile_store_read_object_info(odb->packfiles, real, oi, flags)) @@ odb.c: retry: - if (!loose_object_info(odb->repo, real, oi, flags)) - return 0; + if (!odb_source_loose_read_object_info(source, real, oi, flags)) + return 0; + if (core_use_gvfs_helper && !tried_gvfs_helper) { + enum gh_client__created ghc; @@ odb.c: retry: /* Not a loose object; someone else may have just packed it. */ if (!(flags & OBJECT_INFO_QUICK)) { odb_reprepare(odb->repo->objects); - if (find_pack_entry(odb->repo, real, &e)) - break; + if (!packfile_store_read_object_info(odb->packfiles, real, oi, flags)) + return 0; if (gvfs_virtualize_objects(odb->repo) && !tried_hook) { + // TODO Assert or at least trace2 if gvfs-helper + // TODO was tried and failed and then read-object-hook@@ gvfs-helper-client.c: static void gh_client__update_loose_cache(const char *line + if (get_oid_hex(v1_oid, &oid)) + BUG("update_loose_cache: invalid line '%s'", line); + - odb_loose_cache_add_new_oid(gh_client__chosen_odb, &oid); + odb_source_loose_cache_add_new_oid(gh_client__chosen_odb, &oid); }@@ t/t5799-gvfs-helper.sh: verify_prefetch_keeps () { stop_gvfs_protocol_server @@ t/t5799-gvfs-helper.sh: test_expect_success 'integration: fully implicit: diff 2 commits' ' - >OUT.output 2>OUT.stderr + ! trace_has_immediate_oid $oid <diff-trace.txt ' +#################################################################monitor-componentsworkflow in msft-gitgvfs.fallbackconfig setting@@ .github/workflows/build-git-installers.yml (new) + eval $b/please.sh make_installers_from_mingw_w64_git --include-pdbs \ + --version=${{ needs.prereqs.outputs.tag_version }} \ + -o artifacts --${{matrix.type.name}} \ -+ --pkg=${{matrix.arch.artifact}}/mingw-w64-${{matrix.arch.toolchain}}-git-[0-9]*.tar.xz \ -+ --pkg=${{matrix.arch.artifact}}/mingw-w64-${{matrix.arch.toolchain}}-git-doc-html-[0-9]*.tar.xz && -+ ++ $(ls ${{matrix.arch.artifact}}/mingw-w64-${{matrix.arch.toolchain}}-*.tar.* | ++ sed '/\.sig$/d;/archimport/d;/cvs/d;/p4/d;/gitweb/d;/doc-man/d;s/^/--pkg=/' | ++ tr '\n' ' ') && + if test portable = '${{matrix.type.name}}' && test -n "$(git config alias.signtool)" + then + git signtool artifacts/PortableGit-*.exeuniversal@@ Commit message - build & upload unsigned .deb package Co-authored-by: Lessley Dennington <ldennington@github.com> + Co-authored-by: Sverre Johansen <sverre.johansen@gmail.com> + Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> ## .github/workflows/build-git-installers.yml ## @@ .github/workflows/build-git-installers.yml: jobs: @@ .github/workflows/build-git-installers.yml: jobs: + + # Build unsigned Ubuntu package + create-linux-unsigned-artifacts: -+ runs-on: ubuntu-latest ++ runs-on: ${{ matrix.arch.runner }} ++ strategy: ++ matrix: ++ arch: ++ - name: amd64 ++ runner: ubuntu-latest ++ # EOL 04/2025: https://endoflife.date/ubuntu ++ container_image: ubuntu:20.04 ++ # Use unofficial Node.js builds with glibc-217 for older Ubuntu ++ node_url: https://unofficial-builds.nodejs.org/download/release/v20.18.1/node-v20.18.1-linux-x64-glibc-217.tar.gz ++ - name: arm64 ++ runner: ubuntu-24.04-arm ++ # EOL 04/2027: https://endoflife.date/ubuntu ++ container_image: ubuntu:22.04 ++ # Use official Node.js builds for ARM64 (requires glibc 2.28+, Ubuntu 22.04 has 2.35) ++ node_url: https://nodejs.org/dist/v20.18.1/node-v20.18.1-linux-arm64.tar.gz + container: -+ image: ubuntu:20.04 # security support until 04/02/2025, according to https://endoflife.date/ubuntu ++ image: ${{ matrix.arch.container_image }} + volumes: + # override /__e/node20 because GitHub Actions uses a version that requires too-recent glibc, see "Install dependencies" below + - /tmp:/__e/node20 @@ .github/workflows/build-git-installers.yml: jobs: + libcurl4-gnutls-dev libpcre2-dev zlib1g-dev libexpat-dev \ + curl ca-certificates + -+ # Install a Node.js version that works in older Ubuntu containers (read: does not require very recent glibc) -+ NODE_VERSION=v20.18.1 && -+ NODE_URL=https://unofficial-builds.nodejs.org/download/release/$NODE_VERSION/node-$NODE_VERSION-linux-x64-glibc-217.tar.gz && -+ curl -Lo /tmp/node.tar.gz $NODE_URL && ++ # Install Node.js for GitHub Actions compatibility ++ curl -Lo /tmp/node.tar.gz "${{ matrix.arch.node_url }}" && + tar -C /__e/node20 -x --strip-components=1 -f /tmp/node.tar.gz + + - name: Clone git @@ .github/workflows/build-git-installers.yml: jobs: + die "Could not determine host architecture!" + fi + -+ PKGNAME="microsoft-git_$VERSION" ++ PKGNAME="microsoft-git_${VERSION}_${ARCH}" + PKGDIR="$(dirname $(pwd))/$PKGNAME" + + rm -rf "$PKGDIR"@@ Commit message - job skipped if credentials for accessing certificate aren't present Co-authored-by: Lessley Dennington <ldennington@github.com> + Co-authored-by: Sverre Johansen <sverre.johansen@gmail.com> + Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> ## .github/workflows/build-git-installers.yml ## @@ .github/workflows/build-git-installers.yml: on: @@ .github/workflows/build-git-installers.yml: jobs: - # Build unsigned Ubuntu package + # Build and sign Debian package create-linux-unsigned-artifacts: - runs-on: ubuntu-latest - container: + runs-on: ${{ matrix.arch.runner }} + strategy: @@ .github/workflows/build-git-installers.yml: jobs: - # Move Debian package for later artifact upload - mv "$PKGNAME.deb" "$GITHUB_WORKSPACE" - -+ - name: Upload artifacts -+ uses: actions/upload-artifact@v4 -+ with: -+ name: linux-unsigned-artifacts + - name: Upload artifacts + uses: actions/upload-artifact@v4 + with: +- name: linux-artifacts ++ name: linux-unsigned-${{ matrix.arch.name }} + path: | + *.deb + + create-linux-artifacts: + runs-on: ubuntu-latest + needs: [prereqs, create-linux-unsigned-artifacts] ++ strategy: ++ matrix: ++ arch: [amd64, arm64] + environment: release + steps: + - name: Log into Azure @@ .github/workflows/build-git-installers.yml: jobs: + - name: Download artifacts + uses: actions/download-artifact@v4 + with: -+ name: linux-unsigned-artifacts ++ name: linux-unsigned-${{ matrix.arch }} + + - name: Sign Debian package + run: | + # Sign Debian package + version="${{ needs.prereqs.outputs.tag_version }}" -+ debsigs --sign=origin --verify --check microsoft-git_"$version".deb ++ debsigs --sign=origin --verify --check microsoft-git_"$version"_${{ matrix.arch }}.deb + - - name: Upload artifacts - uses: actions/upload-artifact@v4 - with: - name: linux-artifacts ++ - name: Upload artifacts ++ uses: actions/upload-artifact@v4 ++ with: ++ name: linux-${{ matrix.arch }} path: | *.deb - # End build unsigned Debian package@@ .github/workflows/build-git-installers.yml: jobs: success() || (needs.create-linux-artifacts.result == 'skipped' && @@ .github/workflows/build-git-installers.yml: jobs: - name: linux-artifacts + name: linux-arm64 path: deb-package + - name: Log into Azure@@ Commit message Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> + ## Documentation/scalar.adoc ## +@@ Documentation/scalar.adoc: status.aheadBehind=false:: + message that can be disabled by disabling the `advice.statusAheadBehind` + config. + ++core.configWriteLockTimeoutMS:: ++ Sets a timeout to work gracefully around Git config write contention. ++ + The following settings are different based on which platform is in use: + + core.untrackedCache=(true|false):: + ## scalar.c ## @@ scalar.c: static int set_recommended_config(int reconfigure) - { "core.safeCRLF", "false" }, - { "fetch.showForcedUpdates", "false" }, - { "pack.usePathWalk", "true" }, + { "commitGraph.changedPaths", "true" }, + { "commitGraph.generationVersion", "1" }, + { "core.autoCRLF", "false" }, + { "core.configWriteLockTimeoutMS", "150" }, - { NULL, NULL }, - }; - int i; + { "core.logAllRefUpdates", "true" }, + { "core.safeCRLF", "false" }, + { "credential.https://dev.azure.com.useHttpPath", "true" }, @@ scalar.c: static int set_recommended_config(int reconfigure) */ static int toggle_maintenance(int enable)configcommand for backwards compatibilityendpointcommandgvfs.sharedCache@@ scalar.c #include "setup.h" +#include "wrapper.h" #include "trace2.h" + #include "path.h" #include "json-parser.h" +#include "path.h" +cache-servercommandclone --no-fetch-commits-and-treesfor backwards compatibility@@ scalar.c: static int set_recommended_config(int reconfigure) + } + for (i = 0; config[i].key; i++) { - if (set_scalar_config(config + i, reconfigure)) + if (set_config_if_missing(config + i, reconfigure)) return error(_("could not configure %s=%s"),git stash -ucmdsubmodule_from_path()post-commandhook handling