From 783819d968fdb4fb8225785956231d5882433248 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Mon, 6 Jan 2025 09:06:57 -0500 Subject: [PATCH 1/2] amend! path-walk: improve path-walk speed with many tags (#5205) path-walk: improve path-walk speed with many tags (#5205) In the presence of many tags, the use of oid_array_lookup() can become extremely slow. We should rely upon the SEEN bit instead. This affects the tag-peeling walk as well as the switch statement for adding the peeled object to the correct oid_array. ---- Derrick Stolee found this while testing the 2.47.0.vfs.0.0 pre-release against a repo with many annotated tags. This is a backport of https://github.com/microsoft/git/pull/695. Signed-off-by: Derrick Stolee Signed-off-by: Johannes Schindelin From 000e561da9d3589fcbd1c8788fa116601790c817 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Tue, 7 Jan 2025 09:00:53 +0100 Subject: [PATCH 2/2] amend! Add experimental 'git survey' builtin (#5174) Add experimental 'git survey' builtin (#5174) This introduces `git survey` to Git for Windows ahead of upstream for the express purpose of getting the path-based analysis in the hands of more folks. The inspiration of this builtin is [`git-sizer`](https://github.com/github/git-sizer), but since that command relies on `git cat-file --batch` to get the contents of objects, it has limits to how much information it can provide. This is mostly a rewrite of the `git survey` builtin that was introduced into the `microsoft/git` fork in microsoft/git#667. That version had a lot more bells and whistles, including an analysis much closer to what `git-sizer` provides. The biggest difference in this version is that this one is focused on using the path-walk API in order to visit batches of objects based on a common path. This allows identifying, for instance, the path that is contributing the most to the on-disk size across all versions at that path. For example, here are the top ten paths contributing to my local Git repository (which includes `microsoft/git` and `gitster/git`): ``` TOP FILES BY DISK SIZE ============================================================================ Path | Count | Disk Size | Inflated Size -----------------------------------------+-------+-----------+-------------- whats-cooking.txt | 1373 | 11637459 | 37226854 t/helper/test-gvfs-protocol | 2 | 6847105 | 17233072 git-rebase--helper | 1 | 6027849 | 15269664 compat/mingw.c | 6111 | 5194453 | 463466970 t/helper/test-parse-options | 1 | 3420385 | 8807968 t/helper/test-pkt-line | 1 | 3408661 | 8778960 t/helper/test-dump-untracked-cache | 1 | 3408645 | 8780816 t/helper/test-dump-fsmonitor | 1 | 3406639 | 8776656 po/vi.po | 104 | 1376337 | 51441603 po/de.po | 210 | 1360112 | 71198603 ``` This kind of analysis has been helpful in identifying the reasons for growth in a few internal monorepos. Those findings motivated the changes in #5157 and #5171. With this early version in Git for Windows, we can expand the reach of the experimental tool in advance of it being contributed to the upstream project. Unfortunately, this will mean that in the next `microsoft/git` rebase, Jeff Hostetler's version will need to be pulled out since there are enough conflicts. These conflicts include how tables are stored and generated, as the version in this PR is slightly more general to allow for different kinds of data. Signed-off-by: Johannes Schindelin