Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebase to v2.48.0 rc2 #5348

Merged
merged 423 commits into from
Jan 9, 2025

Conversation

dscho
Copy link
Member

@dscho dscho commented Jan 7, 2025

This closes #5346.

Range-diff relative to -rc1

@dscho dscho self-assigned this Jan 7, 2025
@dscho dscho added this to the Next release milestone Jan 7, 2025
derrickstolee and others added 17 commits January 7, 2025 12:36
In anticipation of using the path-walk API to analyze tags or include
them in a pack-file, add the ability to walk the tags that were included
in the revision walk.

Signed-off-by: Derrick Stolee <[email protected]>
The sparse tree walk algorithm was created in d5d2e93 (revision:
implement sparse algorithm, 2019-01-16) and involves using the
mark_trees_uninteresting_sparse() method. This method takes a repository
and an oidset of tree IDs, some of which have the UNINTERESTING flag and
some of which do not.

Create a method that has an equivalent set of preconditions but uses a
"dense" walk (recursively visits all reachable trees, as long as they
have not previously been marked UNINTERESTING). This is an important
difference from mark_tree_uninteresting(), which short-circuits if the
given tree has the UNINTERESTING flag.

A use of this method will be added in a later change, with a condition
set whether the sparse or dense approach should be used.

Signed-off-by: Derrick Stolee <[email protected]>
This option causes the path-walk API to act like the sparse tree-walk
algorithm implemented by mark_trees_uninteresting_sparse() in
list-objects.c.

Starting from the commits marked as UNINTERESTING, their root trees and
all objects reachable from those trees are UNINTERSTING, at least as we
walk path-by-path. When we reach a path where all objects associated
with that path are marked UNINTERESTING, then do no continue walking the
children of that path.

We need to be careful to pass the UNINTERESTING flag in a deep way on
the UNINTERESTING objects before we start the path-walk, or else the
depth-first search for the path-walk API may accidentally report some
objects as interesting.

Signed-off-by: Derrick Stolee <[email protected]>
This will be helpful in a future change.

Signed-off-by: Derrick Stolee <[email protected]>
In order to more easily compute delta bases among objects that appear at the
exact same path, add a --path-walk option to 'git pack-objects'.

This option will use the path-walk API instead of the object walk given by
the revision machinery. Since objects will be provided in batches
representing a common path, those objects can be tested for delta bases
immediately instead of waiting for a sort of the full object list by
name-hash. This has multiple benefits, including avoiding collisions by
name-hash.

The objects marked as UNINTERESTING are included in these batches, so we
are guaranteeing some locality to find good delta bases.

After the individual passes are done on a per-path basis, the default
name-hash is used to find other opportunistic delta bases that did not
match exactly by the full path name.

RFC TODO: It is important to note that this option is inherently
incompatible with using a bitmap index. This walk probably also does not
work with other advanced features, such as delta islands.

Getting ahead of myself, this option compares well with --full-name-hash
when the packfile is large enough, but also performs at least as well as
the default in all cases that I've seen.

RFC TODO: this should probably be recording the batch locations to another
list so they could be processed in a second phase using threads.

RFC TODO: list some examples of how this outperforms previous pack-objects
strategies. (This is coming in later commits that include performance
test changes.)

Signed-off-by: Derrick Stolee <[email protected]>
There are many tests that validate whether 'git pack-objects' works as
expected. Instead of duplicating these tests, add a new test environment
variable, GIT_TEST_PACK_PATH_WALK, that implies --path-walk by default
when specified.

This was useful in testing the implementation of the --path-walk
implementation, especially in conjunction with test such as:

 - t0411-clone-from-partial.sh : One test fetches from a repo that does
   not have the boundary objects. This causes the path-based walk to
   fail. Disable the variable for this test.

 - t5306-pack-nobase.sh : Similar to t0411, one test fetches from a repo
   without a boundary object.

 - t5310-pack-bitmaps.sh : One test compares the case when packing with
   bitmaps to the case when packing without them. Since we disable the
   test variable when writing bitmaps, this causes a difference in the
   object list (the --path-walk option adds an extra object). Specify
   --no-path-walk in both processes for the comparison. Another test
   checks for a specific delta base, but when computing dynamically
   without using bitmaps, the base object it too small to be considered
   in the delta calculations so no base is used.

 - t5316-pack-delta-depth.sh : This script cares about certain delta
   choices and their chain lengths. The --path-walk option changes how
   these chains are selected, and thus changes the results of this test.

 - t5322-pack-objects-sparse.sh : This demonstrates the effectiveness of
   the --sparse option and how it combines with --path-walk.

 - t5332-multi-pack-reuse.sh : This test verifies that the preferred
   pack is used for delta reuse when possible. The --path-walk option is
   not currently aware of the preferred pack at all, so finds a
   different delta base.

 - t7406-submodule-update.sh : When using the variable, the --depth
   option collides with the --path-walk feature, resulting in a warning
   message. Disable the variable so this warning does not appear.

I want to call out one specific test change that is only temporary:

 - t5530-upload-pack-error.sh : One test cares specifically about an
   "unable to read" error message. Since the current implementation
   performs delta calculations within the path-walk API callback, a
   different "unable to get size" error message appears. When this
   is changed in a future refactoring, this test change can be reverted.

Signed-off-by: Derrick Stolee <[email protected]>
Since 'git pack-objects' supports a --path-walk option, allow passing it
through in 'git repack'. This presents interesting testing opportunities for
comparing the different repacking strategies against each other.

Add the --path-walk option to the performance tests in p5313.

For the microsoft/fluentui repo [1] checked out at a specific commit [2],
the results are very interesting:

Test                                           this tree
------------------------------------------------------------------
5313.2: thin pack                              0.40(0.47+0.04)
5313.3: thin pack size                                    1.2M
5313.4: thin pack with --full-name-hash        0.09(0.10+0.04)
5313.5: thin pack size with --full-name-hash             22.8K
5313.6: thin pack with --path-walk             0.08(0.06+0.02)
5313.7: thin pack size with --path-walk                  20.8K
5313.8: big pack                               2.16(8.43+0.23)
5313.9: big pack size                                    17.7M
5313.10: big pack with --full-name-hash        1.42(3.06+0.21)
5313.11: big pack size with --full-name-hash             18.0M
5313.12: big pack with --path-walk             2.21(8.39+0.24)
5313.13: big pack size with --path-walk                  17.8M
5313.14: repack                                98.05(662.37+2.64)
5313.15: repack size                                    449.1K
5313.16: repack with --full-name-hash          33.95(129.44+2.63)
5313.17: repack size with --full-name-hash              182.9K
5313.18: repack with --path-walk               106.21(121.58+0.82)
5313.19: repack size with --path-walk                   159.6K

[1] https://github.com/microsoft/fluentui
[2] e70848ebac1cd720875bccaa3026f4a9ed700e08

This repo suffers from having a lot of paths that collide in the name
hash, so examining them in groups by path leads to better deltas. Also,
in this case, the single-threaded implementation is competitive with the
full repack. This is saving time diffing files that have significant
differences from each other.

A similar, but private, repo has even more extremes in the thin packs:

Test                                           this tree
--------------------------------------------------------------
5313.2: thin pack                              2.39(2.91+0.10)
5313.3: thin pack size                                    4.5M
5313.4: thin pack with --full-name-hash        0.29(0.47+0.12)
5313.5: thin pack size with --full-name-hash             15.5K
5313.6: thin pack with --path-walk             0.35(0.31+0.04)
5313.7: thin pack size with --path-walk                  14.2K

Notice, however, that while the --full-name-hash version is working
quite well in these cases for the thin pack, it does poorly for some
other standard cases, such as this test on the Linux kernel repository:

Test                                           this tree
--------------------------------------------------------------
5313.2: thin pack                              0.01(0.00+0.00)
5313.3: thin pack size                                     310
5313.4: thin pack with --full-name-hash        0.00(0.00+0.00)
5313.5: thin pack size with --full-name-hash              1.4K
5313.6: thin pack with --path-walk             0.00(0.00+0.00)
5313.7: thin pack size with --path-walk                    310

Here, the --full-name-hash option does much worse than the default name
hash, but the path-walk option does exactly as well.

Signed-off-by: Derrick Stolee <[email protected]>
Users may want to enable the --path-walk option for 'git pack-objects' by
default, especially underneath commands like 'git push' or 'git repack'.

This should be limited to client repositories, since the --path-walk option
disables bitmap walks, so would be bad to include in Git servers when
serving fetches and clones. There is potential that it may be helpful to
consider when repacking the repository, to take advantage of improved deltas
across historical versions of the same files.

Much like how "pack.useSparse" was introduced and included in
"feature.experimental" before being enabled by default, use the repository
settings infrastructure to make the new "pack.usePathWalk" config enabled by
"feature.experimental" and "feature.manyFiles".

Signed-off-by: Derrick Stolee <[email protected]>
Repositories registered with Scalar are expected to be client-only
repositories that are rather large. This means that they are more likely to
be good candidates for using the --path-walk option when running 'git
pack-objects', especially under the hood of 'git push'. Enable this config
in Scalar repositories.

Signed-off-by: Derrick Stolee <[email protected]>
Previously, the --path-walk option to 'git pack-objects' would compute
deltas inline with the path-walk logic. This would make the progress
indicator look like it is taking a long time to enumerate objects, and
then very quickly computed deltas.

Instead of computing deltas on each region of objects organized by tree,
store a list of regions corresponding to these groups. These can later
be pulled from the list for delta compression before doing the "global"
delta search.

This presents a new progress indicator that can be used in tests to
verify that this stage is happening.

The current implementation is not integrated with threads, but could be
done in a future update.

Since we do not attempt to sort objects by size until after exploring
all trees, we can remove the previous change to t5530 due to a different
error message appearing first.

Signed-off-by: Derrick Stolee <[email protected]>
Adapting the implementation of ll_find_deltas(), create a threaded
version of the --path-walk compression step in 'git pack-objects'.

This involves adding a 'regions' member to the thread_params struct,
allowing each thread to own a section of paths. We can simplify the way
jobs are split because there is no value in extending the batch based on
name-hash the way sections of the object entry array are attempted to be
grouped. We re-use the 'list_size' and 'remaining' items for the purpose
of borrowing work in progress from other "victim" threads when a thread
has finished its batch of work more quickly.

Using the Git repository as a test repo, the p5313 performance test
shows that the resulting size of the repo is the same, but the threaded
implementation gives gains of varying degrees depending on the number of
objects being packed. (This was tested on a 16-core machine.)

Test                                    HEAD~1    HEAD
-------------------------------------------------------------
5313.6: thin pack with --path-walk        0.01    0.01  +0.0%
5313.7: thin pack size with --path-walk    475     475  +0.0%
5313.12: big pack with --path-walk        1.99    1.87  -6.0%
5313.13: big pack size with --path-walk  14.4M   14.3M  -0.4%
5313.18: repack with --path-walk         98.14   41.46 -57.8%
5313.19: repack size with --path-walk   197.2M  197.3M  +0.0%

Signed-off-by: Derrick Stolee <[email protected]>
When adding tree objects, we are very careful to avoid adding the same
tree object more than once. There was one small gap in that logic,
though: when adding a root tree object. Two refs can easily share the
same root tree object, and we should still not add it more than once.

Signed-off-by: Johannes Schindelin <[email protected]>
This topic branch extends the protections introduced for Git GUI's
CVE-2022-41953 to cover `gitk`, too.

Signed-off-by: Johannes Schindelin <[email protected]>
These fixes were necessary for Sverre Rabbelier's remote-hg to work,
but for some magic reason they are not necessary for the current
remote-hg. Makes you wonder how that one gets away with it.

Signed-off-by: Johannes Schindelin <[email protected]>
This topic branch allows us to specify absolute paths without the drive
prefix e.g. when cloning.

Example:

	C:\Users\me> git clone https://github.com/git/git \upstream-git

This will clone into a new directory C:\upstream-git, in line with how
Windows interprets absolute paths.

Signed-off-by: Johannes Schindelin <[email protected]>
This topic branch teaches `git clean` to respect NTFS junctions and Unix
bind mounts: it will now stop at those boundaries.

Signed-off-by: Johannes Schindelin <[email protected]>
In MSYS2, we have two Python interpreters at our disposal, so we can
include the Python stuff in the build.

Signed-off-by: Johannes Schindelin <[email protected]>
@dscho dscho force-pushed the rebase-to-v2.48.0-rc2 branch from 6e98293 to 6fc8ad7 Compare January 7, 2025 11:37
@dscho
Copy link
Member Author

dscho commented Jan 7, 2025

Oy. I had rebased to -rc0.

Range-diff relative to -rc1

@rimrul
Copy link
Member

rimrul commented Jan 7, 2025

the win test failures look weird.

unit-tests/bin/t-reftable-basics.exe (Wstat: 32512 (exited 127) Tests: 8 Failed: 0)
Non-zero exit status: 127
Parse errors: No plan found in TAP output

Why do they seemingly all test reftable basics?

@rimrul
Copy link
Member

rimrul commented Jan 7, 2025

Probable suspects for win test breakage: 8db127d
and 2cca185

@dscho
Copy link
Member Author

dscho commented Jan 7, 2025

Why do they seemingly all test reftable basics?

That's because the unit tests are not sharded between the matrix jobs. They are simply run in all of them.

Non-zero exit status: 127

That means that the executable (or, crucially, a DLL needed by that executable) is not found.

Another curiosity is that it does work in win+VS test.

@dscho
Copy link
Member Author

dscho commented Jan 7, 2025

Another curiosity is that it does work in win+VS test.

The reason is that we're not using mimalloc in the win+VS builds, and the new tests (both commits spot on @rimrul, great intuition!) introduce incorrect code that mixes back in regular MSVCRT's malloc()/realloc()/free().

dscho and others added 4 commits January 7, 2025 16:56
As indicated by the `#undef malloc` line in `reftable/basics.h`, it is
quite common to compile in allocators other than the default one by
defining `malloc` constants and friends.

This pattern is used e.g. in Git for Windows, which uses the powerful
and performant `mimalloc` allocator.

Furthermore, in `reftable/basics.c` this `#undef malloc` is
_specifically_ disabled by virtue of defining the
`REFTABLE_ALLOW_BANNED_ALLOCATORS` constant before including
`reftable/basic.h`.

However, in 8db127d (reftable: avoid leaks on realloc error,
2024-12-28) and in 2cca185 (reftable: fix allocation count on
realloc error, 2024-12-28), `reftable_set_alloc()` function calls were
introduced that pass `malloc`, `realloc` and `free` function pointers as
parameters _after_ `reftable/basics.h` ensured that they were no longer
`#define`d.

This causes problems because those calls happen after the initial
allocator has already been used to initialize an array, which is
subsequently resized using the overridden default `realloc()` allocator.

You cannot mix and match allocators like that, which leads to a
`STATUS_HEAP_CORRUPTION` (C0000374), and when running this unit test
through shell and/or `prove` (which only support 7-bit status codes), it
surfaces as exit code 127.

It is totally unnecessary to pass those function pointers to
`malloc`/`realloc`/`free` in, though: The `reftable` code goes out of
its way to fall back to the initial allocator when passing `NULL`
parameters instead. So let's do that instead of causing heap
corruptions.

Signed-off-by: Johannes Schindelin <[email protected]>
Start work on a new 'git survey' command to scan the repository
for monorepo performance and scaling problems.  The goal is to
measure the various known "dimensions of scale" and serve as a
foundation for adding additional measurements as we learn more
about Git monorepo scaling problems.

The initial goal is to complement the scanning and analysis performed
by the GO-based 'git-sizer' (https://github.com/github/git-sizer) tool.
It is hoped that by creating a builtin command, we may be able to take
advantage of internal Git data structures and code that is not
accessible from GO to gain further insight into potential scaling
problems.

Co-authored-by: Derrick Stolee <[email protected]>
Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Derrick Stolee <[email protected]>
By default we will scan all references in "refs/heads/", "refs/tags/"
and "refs/remotes/".

Add command line opts let the use ask for all refs or a subset of them
and to include a detached HEAD.

Signed-off-by: Jeff Hostetler <[email protected]>
Signed-off-by: Derrick Stolee <[email protected]>
When 'git survey' provides information to the user, this will be presented
in one of two formats: plaintext and JSON. The JSON implementation will be
delayed until the functionality is complete for the plaintext format.

The most important parts of the plaintext format are headers specifying the
different sections of the report and tables providing concreted data.

Create a custom table data structure that allows specifying a list of
strings for the row values. When printing the table, check each column for
the maximum width so we can create a table of the correct size from the
start.

The table structure is designed to be flexible to the different kinds of
output that will be implemented in future changes.

Signed-off-by: Derrick Stolee <[email protected]>
dscho and others added 18 commits January 7, 2025 16:57
Signed-off-by: Johannes Schindelin <[email protected]>
This was pull request git-for-windows#1645 from ZCube/master

Support windows container.

Signed-off-by: Johannes Schindelin <[email protected]>
…ws#4527)

With this patch, Git for Windows works as intended on mounted APFS
volumes (where renaming read-only files would fail).

Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
This patch introduces support to set special NTFS attributes that are
interpreted by the Windows Subsystem for Linux as file mode bits, UID
and GID.

Signed-off-by: Johannes Schindelin <[email protected]>
Handle Ctrl+C in Git Bash nicely

Signed-off-by: Johannes Schindelin <[email protected]>
A fix for calling `vim` in Windows Terminal caused a regression and was
reverted. We partially un-revert this, to get the fix again.

Signed-off-by: Johannes Schindelin <[email protected]>
This topic branch re-adds the deprecated --stdin/-z options to `git
reset`. Those patches were overridden by a different set of options in
the upstream Git project before we could propose `--stdin`.

We offered this in MinGit to applications that wanted a safer way to
pass lots of pathspecs to Git, and these applications will need to be
adjusted.

Instead of `--stdin`, `--pathspec-from-file=-` should be used, and
instead of `-z`, `--pathspec-file-nul`.

Signed-off-by: Johannes Schindelin <[email protected]>
Originally introduced as `core.useBuiltinFSMonitor` in Git for Windows
and developed, improved and stabilized there, the built-in FSMonitor
only made it into upstream Git (after unnecessarily long hemming and
hawing and throwing overly perfectionist style review sticks into the
spokes) as `core.fsmonitor = true`.

In Git for Windows, with this topic branch, we re-introduce the
now-obsolete config setting, with warnings suggesting to existing users
how to switch to the new config setting, with the intention to
ultimately drop the patch at some stage.

Signed-off-by: Johannes Schindelin <[email protected]>
We map WSAGetLastError() errors to errno errors in winsock_error_to_errno(),
but the MSVC strerror() implementation only produces "Unknown error" for
most of them. Produce some more meaningful error messages in these
cases.

Our builds for ARM64 link against the newer UCRT strerror() that does know
these errors, so we won't change the strerror() used there.

The wording of the messages is copied from glibc strerror() messages.

Reported-by: M Hickford <[email protected]>
Signed-off-by: Matthias Aßhauer <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
The part about keeping the original error number hasn't been accurate since
commit c11f75c (mingw: make sure errno is set correctly when socket
operations fail, 2019-11-25) and the part about strerror() not knowing
about these errors is untrue since the previous commit.

Signed-off-by: Matthias Aßhauer <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
Commit 2406bf5 (Win32: detect unix socket support at runtime,
2024-04-03) introduced a runtime detection for whether the operating
system supports unix sockets for Windows, but a mistake snuck into the
tests. When building and testing Git without NO_UNIX_SOCKETS we
currently skip t0301-credential-cache on Windows if unix sockets are
supported and run the tests if they aren't.

Flip that logic to actually work the way it was intended.

Signed-off-by: Matthias Aßhauer <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
…updates

Start monitoring updates of Git for Windows' component in the open
In 245670c (credential-cache: check for windows specific errors, 2021-09-14)
we concluded that on Windows we would always encounter ENETDOWN where we
would expect ECONNREFUSED on POSIX systems, when connecting to unix sockets.
As reported in [1], we do encounter ECONNREFUSED on Windows if the
socket file doesn't exist, but the containing directory does and ENETDOWN if
neither exists. We should handle this case like we do on non-windows systems.

[1] git-for-windows#4762 (comment)

This fixes git-for-windows#5314

Helped-by: M Hickford <[email protected]>
Signed-off-by: Matthias Aßhauer <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
Add a README.md for GitHub goodness.

Signed-off-by: Johannes Schindelin <[email protected]>
@dscho dscho force-pushed the rebase-to-v2.48.0-rc2 branch from 6fc8ad7 to 783fac4 Compare January 7, 2025 16:04
@dscho
Copy link
Member Author

dscho commented Jan 7, 2025

Range-diff relative to the pre-force-push iteration
  • -: ------------ > 1: 1350d79 t-reftable-basics: stop assuming that malloc is not a constant

  • 1: 9eefe15 ! 2: 5de5395 backfill: add builtin boilerplate

    @@ Documentation/git-backfill.txt (new)
     +---
     +Part of the linkgit:git[1] suite
     
    + ## Documentation/meson.build ##
    +@@ Documentation/meson.build: manpages = {
    +   'git-apply.txt' : 1,
    +   'git-archimport.txt' : 1,
    +   'git-archive.txt' : 1,
    ++  'git-backfill.txt' : 1,
    +   'git-bisect.txt' : 1,
    +   'git-blame.txt' : 1,
    +   'git-branch.txt' : 1,
    +
      ## Makefile ##
     @@ Makefile: BUILTIN_OBJS += builtin/am.o
      BUILTIN_OBJS += builtin/annotate.o
  • 2: 01a22c9 = 3: ec146ba backfill: basic functionality and tests

  • 3: f0380ae = 4: dc94934 backfill: add --batch-size= option

  • 4: 90734e1 = 5: 35b7e38 backfill: add --sparse option

  • 5: 487ab66 = 6: 2264e15 backfill: assume --sparse when sparse-checkout is enabled

  • 6: 5643353 = 7: a86d017 backfill: mark it as experimental

  • 7: ca21769 ! 8: c6b7ce0 survey: stub in new experimental 'git-survey' command

    @@ Documentation/git-survey.txt (new)
     +---
     +Part of the linkgit:git[1] suite
     
    + ## Documentation/meson.build ##
    +@@ Documentation/meson.build: manpages = {
    +   'git-status.txt' : 1,
    +   'git-stripspace.txt' : 1,
    +   'git-submodule.txt' : 1,
    ++  'git-survey.txt' : 1,
    +   'git-svn.txt' : 1,
    +   'git-switch.txt' : 1,
    +   'git-symbolic-ref.txt' : 1,
    +
      ## Makefile ##
     @@ Makefile: BUILTIN_OBJS += builtin/sparse-checkout.o
      BUILTIN_OBJS += builtin/stash.o
  • 8: aadce39 = 9: 7d894d8 survey: add command line opts to select references

  • 9: 99e3587 = 10: 0d8393e survey: start pretty printing data in table form

  • 10: 5dbbcfa = 11: db19259 survey: add object count summary

  • 11: 2eac644 = 12: 4019c90 survey: summarize total sizes by object type

  • 12: 945f824 = 13: 1edff6d survey: show progress during object walk

  • 13: f97b2ba = 14: e65957e survey: add ability to track prioritized lists

  • 14: 6c9d9dc = 15: fc9fb68 survey: add report of "largest" paths

  • 15: 8ad64ab = 16: 5c03374 survey: add --top= option and config

  • 16: 603e711 = 17: c1267cc survey: clearly note the experimental nature in the output

  • 17: c281e25 = 18: 084fec0 path-walk: improve path-walk speed with many tags

  • 18: 79c4530 = 19: 09c9eaf Win32: make FILETIME conversion functions public

  • 19: ed0d5e9 = 20: 8a2cf44 Win32: dirent.c: Move opendir down

  • 20: d37b5aa = 21: 8bccc70 mingw: make the dirent implementation pluggable

  • 21: 3e7b2ce = 22: d70dde0 Win32: make the lstat implementation pluggable

  • 22: 4b33846 = 23: 79b7550 mingw: add infrastructure for read-only file system level caches

  • 23: 18e958c = 24: 31d1d18 mingw: add a cache below mingw's lstat and dirent implementations

  • 24: 548bd40 = 25: fedce44 fscache: load directories only once

  • 25: d53c448 = 26: 101b990 fscache: add key for GIT_TRACE_FSCACHE

  • 26: dc2a772 = 27: 537c684 fscache: remember not-found directories

  • 27: fd52c1c = 28: 26e514b fscache: add a test for the dir-not-found optimization

  • 28: c8ebbe3 = 29: 4a0b366 add: use preload-index and fscache for performance

  • 29: 4d6767e = 30: de1f16b dir.c: make add_excludes aware of fscache during status

  • 30: f27e5ad = 31: ccae4a7 fscache: make fscache_enabled() public

  • 31: bb54df4 = 32: 675d8dc dir.c: regression fix for add_excludes with fscache

  • 32: 62af0e4 = 33: 62a3d3c fetch-pack.c: enable fscache for stats under .git/objects

  • 33: f98020a = 34: 71c8974 checkout.c: enable fscache for checkout again

  • 34: 39f1de3 = 35: 7ffa2ec Enable the filesystem cache (fscache) in refresh_index().

  • 35: 7166713 = 36: 0c81d79 fscache: use FindFirstFileExW to avoid retrieving the short name

  • 36: b0f0f68 = 37: d27fb3c status: disable and free fscache at the end of the status command

  • 37: 2cc85e7 = 38: b0da78f fscache: add GIT_TEST_FSCACHE support

  • 38: 6f9bd24 = 39: c27092b fscache: add fscache hit statistics

  • 39: 5e194ad = 40: 409d6ec mem_pool: add GIT_TRACE_MEMPOOL support

  • 40: fb469fe = 41: 0d603f9 fscache: fscache takes an initial size

  • 41: bc302e5 = 42: 580ced9 fscache: update fscache to be thread specific instead of global

  • 42: a2458de = 43: 1ad0115 fscache: teach fscache to use mempool

  • 43: 7a1f611 = 44: 8e76d16 fscache: make fscache_enable() thread safe

  • 47: 6696e15 = 45: 8393a50 git-gui: provide question helper for retry fallback on Windows

  • 44: e1b3ae4 = 46: fe8536e fscache: teach fscache to use NtQueryDirectoryFile

  • 49: 90dc1d0 = 47: 6704d86 git gui: set GIT_ASKPASS=git-gui--askpass if not set yet

  • 45: 7d6feca = 48: f67147a unpack-trees: enable fscache for sparse-checkout

  • 52: 48773b5 = 49: 2bd9665 git-gui--askyesno: fix funny text wrapping

  • 46: e90146a = 50: 8ee5310 fscache: remember the reparse tag for each entry

  • 54: 336986a = 51: 7117b15 git-gui--askyesno: allow overriding the window title

  • 48: 2881294 = 52: 00da8ee fscache: implement an FSCache-aware is_mount_point()

  • 56: ca4cffd = 53: cd492f2 git-gui--askyesno (mingw): use Git for Windows' icon, if available

  • 50: ae99248 = 54: 36b6ed3 clean: make use of FSCache

  • 51: 49d707a = 55: 908c5a1 gitk: Unicode file name support

  • 53: 716a6ee = 56: 1a36bff gitk: Use an external icon file on Windows

  • 55: 9677438 = 57: 8999621 gitk: fix arrow keys in input fields with Tcl/Tk >= 8.6

  • 57: d0d564c = 58: cafd484 gitk: make the "list references" default window width wider

  • 58: da8e909 = 59: 327543d pack-objects (mingw): demonstrate a segmentation fault with large deltas

  • 59: eba8042 = 60: bfc3db8 mingw: support long paths

  • 60: a8b2d50 = 61: 05674cc Win32: fix 'lstat("dir/")' with long paths

  • 61: eac273d = 62: 4ec7736 win32(long path support): leave drive-less absolute paths intact

  • 62: 1a7d37e = 63: 884eda4 mingw: Support git_terminal_prompt with more terminals

  • 63: faed1e5 = 64: 448abe1 compat/terminal.c: only use the Windows console if bash 'read -r' fails

  • 64: 4b7fdec = 65: 7f91760 mingw (git_terminal_prompt): do fall back to CONIN$/CONOUT$ method

  • 65: b4bd0c5 = 66: af1cd45 strbuf_readlink: don't call readlink twice if hint is the exact link size

  • 66: fee6e2e = 67: dedb1ae strbuf_readlink: support link targets that exceed PATH_MAX

  • 67: 64a8746 = 68: 2e09e44 lockfile.c: use is_dir_sep() instead of hardcoded '/' checks

  • 68: 2acda4f = 69: aadf124 Win32: don't call GetFileAttributes twice in mingw_lstat()

  • 71: c27933b = 70: de191d1 Win32: implement stat() with symlink support

  • 72: 914992e = 71: af7734a Win32: remove separate do_lstat() function

  • 69: c0d4914 = 72: 5f4c415 compat/fsmonitor/fsm-*-win32: support long paths

  • 70: fbfba49 = 73: 4b2711a clean: suggest using core.longPaths if paths are too long to remove

  • 73: bc2f9ed = 74: 7a45cf7 Win32: let mingw_lstat() error early upon problems with reparse points

  • 74: 9cbea6a = 75: 6c99a15 mingw: teach fscache and dirent about symlinks

  • 75: ea0388c = 76: 41c5c14 Win32: lstat(): return adequate stat.st_size for symlinks

  • 76: a9adc7d = 77: 65b8430 Win32: factor out retry logic

  • 77: 9153540 = 78: a67d839 Win32: change default of 'core.symlinks' to false

  • 78: 10ce724 = 79: 24160db Win32: add symlink-specific error codes

  • 79: 7b9f40d = 80: d3df280 Win32: mingw_unlink: support symlinks to directories

  • 80: 0a7214b = 81: 87b35ec Win32: mingw_rename: support renaming symlinks

  • 81: 1fb7c27 = 82: 4a58657 Win32: mingw_chdir: change to symlink-resolved directory

  • 82: ceb5fdc = 83: 585b4bb Win32: implement readlink()

  • 83: 749cc6b = 84: 4236f8c mingw: lstat: compute correct size for symlinks

  • 84: 9f4c56a = 85: 17bf729 Win32: implement basic symlink() functionality (file symlinks only)

  • 85: aebe11e = 86: f7ab0fb Win32: symlink: add support for symlinks to directories

  • 86: 4febfba = 87: 96891c3 mingw: try to create symlinks without elevated permissions

  • 87: b1868ec = 88: 85d41e4 mingw: emulate stat() a little more faithfully

  • 88: 4fbb974 = 89: c80d141 mingw: special-case index entries for symlinks with buggy size

  • 89: 9e8a0d3 = 90: 06376e6 mingw: introduce code to detect whether we're inside a Windows container

  • 98: b544879 = 91: eb17c48 mingw: when running in a Windows container, try to rename() harder

  • 100: dc767a0 = 92: f33e574 mingw: move the file_attr_to_st_mode() function definition

  • 102: 01c1251 = 93: 167e81d mingw: Windows Docker volumes are not symbolic links

  • 90: 6797f79 = 94: e13c7c6 Win32: symlink: move phantom symlink creation to a separate function

  • 104: 5dd28c1 = 95: 3e617a8 mingw: work around rename() failing on a read-only file

  • 91: 51966f5 = 96: ad1b7a4 Introduce helper to create symlinks that knows about index_state

  • 92: d84cc14 = 97: 3199b1b mingw: allow to specify the symlink type in .gitattributes

  • 93: 51d9aea = 98: faf1f46 Win32: symlink: add test for symlink attribute

  • 94: 4c79173 = 99: dfbdd01 mingw: explicitly specify with which cmd to prefix the cmdline

  • 95: 72e08b9 = 100: 0612f86 mingw: when path_lookup() failed, try BusyBox

  • 96: ad65228 = 101: 3b9554e test-lib: avoid unnecessary Perl invocation

  • 97: 8b20fb4 = 102: f6719f8 test-tool: learn to act as a drop-in replacement for iconv

  • 99: 8018825 = 103: d7f4c1d tests(mingw): if iconv is unavailable, use test-helper --iconv

  • 101: 7e3ad88 = 104: f03dc97 gitattributes: mark .png files as binary

  • 103: 3ca328d = 105: 14fa4bf tests: move test PNGs into t/lib-diff/

  • 105: e1bbe76 = 106: 87dc864 tests: only override sort & find if there are usable ones in /usr/bin/

  • 106: 6b56a17 = 107: 3789353 tests: use the correct path separator with BusyBox

  • 107: 9055c7b = 108: 6a6623b mingw: only use Bash-ism builtin pwd -W when available

  • 108: 9346116 = 109: 7f70225 tests (mingw): remove Bash-specific pwd option

  • 109: a074d6e = 110: 5503200 test-lib: add BUSYBOX prerequisite

  • 110: 8c14466 = 111: 7ccc72d t5003: use binary file from t/lib-diff/

  • 111: 0b194e6 = 112: bf6204f t5532: workaround for BusyBox on Windows

  • 112: e474bb3 = 113: 7c97cdd t5605: special-case hardlink test for BusyBox-w32

  • 113: 2fc7ed6 = 114: 25a13ab t5813: allow for $PWD to be a Windows path

  • 114: b4a2f6f = 115: 7e1cf71 t9200: skip tests when $PWD contains a colon

  • 115: 7230b75 = 116: 77fb64f mingw: add a Makefile target to copy test artifacts

  • 116: c76367f = 117: 2e1f7ed mingw: optionally enable wsl compability file mode bits

  • 118: 04d630e = 118: 0adaeae mingw: kill child processes in a gentler way

  • 122: 64677a0 = 119: 64b0171 mingw: really handle SIGINT

  • 119: 5bab650 = 120: 3a711ce mingw: do not call xutftowcs_path in mingw_mktemp

  • 120: bb16c1b = 121: a0a9279 Add a GitHub workflow to monitor component updates

  • 123: 17726ee = 122: de57309 Partially un-revert "editor: save and reset terminal after calling EDITOR"

  • 124: 77094d1 = 123: db1e920 reset: reinstate support for the deprecated --stdin option

  • 125: 1c68d2a = 124: c359f03 fsmonitor: reintroduce core.useBuiltinFSMonitor

  • 126: 0cceb09 = 125: 8732d41 dependabot: help keeping GitHub Actions versions up to date

  • 117: 3af2a34 = 126: 49453d0 Describe Git for Windows' architecture [no ci]

  • 121: 7da69d9 = 127: 2890784 Modify the Code of Conduct for Git for Windows

  • 127: 3872f85 = 128: 2b556c9 CONTRIBUTING.md: add guide for first-time contributors

  • 128: 16bba27 = 129: b7c9a2b README.md: Add a Windows-specific preamble

  • 129: 8bb1ee3 = 130: 1f22ae1 Add an issue template

  • 130: 3fac7ef = 131: 5b004bc Modify the GitHub Pull Request template (to reflect Git for Windows)

  • 131: 53e7994 = 132: b6c04e4 SECURITY.md: document Git for Windows' policies

  • 132: 0b4c582 = 133: af9a2b6 compat/mingw: handle WSA errors in strerror

  • 133: 69cc7a6 = 134: 0cf45c8 compat/mingw: drop outdated comment

  • 134: e4d986a = 135: c4d7ea8 t0301: actually test credential-cache on Windows

  • 135: 0d2f7bf = 136: ef52c17 credential-cache: handle ECONNREFUSED gracefully

pks-t and others added 2 commits January 9, 2025 12:07
In 6411a0a (builtin/blame: fix type of `length` variable when
emitting object ID, 2024-12-06) we have fixed the type of the `length`
variable. In order to avoid a cast from `size_t` to `int` in the call to
printf(3p) with the "%.*s" formatter we have converted the code to
instead use fwrite(3p), which accepts the length as a `size_t`.

It was reported though that this makes us read over the end of the OID
array when the provided `--abbrev=` length exceeds the length of the
object ID. This is because fwrite(3p) of course doesn't stop when it
sees a NUL byte, where as printf(3p) does.

Fix the bug by reverting back to printf(3p) and culling the provided
length to keep it from overflowing when cast to an `int`.

Note that when calling `git blame --abbrev=<N>` with an `N` that is
larger than the maximal OID hex size, there is a subtle side effect that
makes it behave _differently_ than specifying said maximal hex size:
When the command outputs boundary, unblamable or ignored commits' OIDs,
those outputs are prefixed with characters indicating this, and the
`abbrev` value is used to align the information that comes after the
OID, clipping it as needed. Specifying a "too large" abbrev value here
tells Git that yes, we want the full OIDs and don't you worry about
alignment.

Thanks to SHA-256 being _larger_ than the default SHA-1-based OIDs, and
thanks to clipping at `GIT_MAX_HEXSZ`, this change of behavior can only
be observed when running the test in SHA-256 mode.

Nevertheless, this means that we cannot cull at `GIT_MAX_HEXSZ` but at a
slightly larger threshold value.

This patch is a slightly modified version of the one sent as
https://lore.kernel.org/git/20250109-b4-pks-blame-truncate-hash-length-v1-1-9ad4bb09e059@pks.im
to the Git mailing list.

Reported-by: Johannes Schindelin <[email protected]>
Signed-off-by: Patrick Steinhardt <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
"Why would one want to run it in parallel?" I hear you ask. I am glad
you are curious, because a curious story is what it is, indeed.

The `GIT-VERSION-GEN` script is quite a pillar of Git's source code,
with most lines being unchanged for the past 15 years. Until the v2.48.0
release candidate cycle.

Its original purpose was to generate the version string and store it in
the `GIT-VERSION-FILE`.

This paradigm changed quite dramatically when support for building with
Meson was introduced. Most crucially, a38edab (Makefile: generate
doc versions via GIT-VERSION-GEN, 2024-12-06) changed the way the
documentation is built by using the `GIT-VERSION-GEN` file to write out
the `asciidocor-extensions.rb` and `asciidoc.conf` files with now
hard-coded version strings.

Crucially, the Makefile rule to generate those files need to be run
in every build because `GIT_VERSION` could have been specified, which
would require these files to be changed.

This introduced a surprising race condition!

And this is how that race surfaces: When calling `make -j2 html man`
from the top-level directory (a variant of which is invoked in Git for
Windows' release process), two sub-processes are spawned, a `make -C
Documentation html` one and a `make -C Documentation man` one. Both run
the rule to (re-)generate `asciidoctor-extensions.rb` or
`asciidoc.conf`, invoking `GIT-VERSION-GEN` to do so. That script first
generates a temporary file (appending the `+` character to the
filename), then looks whether it contains something different than the
already existing file (if it exists, that is), and either replaces it if
needed, or removes the temporary file. If one of the two parallel
invocations removes that temporary file before the other can compare it,
or even worse: if one tries to replace the target file just after the
other _started_ writing the temporary file (but did not finish), that
race condition now causes bad builds.

This may sound highly theoretical, but due to Git's choices, Git for
Windows is forced to use a (slow) POSIX emulation layer to run that
script and in the blink of an eye it becomes very much not theoretical
at all. See these failed GitHub workflow runs as Exhibit A:

https://github.com/git-for-windows/git-sdk-32/actions/runs/12663456654
https://github.com/git-for-windows/git-sdk-32/actions/runs/12683174970
https://github.com/git-for-windows/git-sdk-64/actions/runs/12649348496

While it is undesirable to run this script over and over again,
certainly when this involves above-mentioned slow POSIX emulation layer,
the stage of the release cycle in which we are presently finding
ourselves dictates that a quick and reliable work-around be implemented
that works around the race condition without changing the overall
architecture of the build process.

This patch does that: By using a filename suffix for the temporary file
that includes the currently-executing script's process ID, We guarantee
that the two competing invocations cannot overwrite or remove each
others' temporary files.

Incidentally, this also fixes something else: The `+` character is
not even a valid filename character on Windows. The only reason why Git
for Windows did not need this is that above-mentioned POSIX emulation
layer also plays a couple of tricks with filenames (tricks that are not
interoperable with regular Windows programs, though), and previous
attempts to remedy this in git/git were unsuccessful, see e.g.
https://lore.kernel.org/git/[email protected]/

This commit fixes one of the issues that are currently delaying Git for
Windows v2.48.0-rc2.

Signed-off-by: Johannes Schindelin <[email protected]>
@dscho
Copy link
Member Author

dscho commented Jan 9, 2025

/git-artifacts

The tag-git workflow run was started

The git-artifacts-x86_64 workflow run was started.
The git-artifacts-i686 workflow run was started.
The git-artifacts-aarch64 workflow run was started.

@dscho
Copy link
Member Author

dscho commented Jan 9, 2025

/git-artifacts

git-artifacts-x86_64 run already exists at https://github.com/git-for-windows/git/runs/35367746556.
git-artifacts-i686 run already exists at https://github.com/git-for-windows/git/runs/35372241262.
The git-artifacts-aarch64 workflow run was started.

@dscho
Copy link
Member Author

dscho commented Jan 9, 2025

I am running /git-artifacts a second time because I want to verify that the packager is correctly recorded in the Pacman packages.

@dscho
Copy link
Member Author

dscho commented Jan 9, 2025

I am running /git-artifacts a second time because I want to verify that the packager is correctly recorded in the Pacman packages.

And of course that does not work because:

git-artifacts-x86_64 run already exists at https://github.com/git-for-windows/git/runs/35367746556.
git-artifacts-i686 run already exists at https://github.com/git-for-windows/git/runs/35372241262.

I guess it's back to starting new runs manually.

@dscho
Copy link
Member Author

dscho commented Jan 9, 2025

/release

The release-git workflow run was started

@gitforwindowshelper gitforwindowshelper bot merged commit 049f0cf into git-for-windows:main Jan 9, 2025
71 checks passed
@dscho dscho deleted the rebase-to-v2.48.0-rc2 branch January 9, 2025 17:49
@dscho dscho mentioned this pull request Jan 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[New git version] v2.48.0-rc2