Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manifest install: Cache compiler information for change detection #362

Open
wants to merge 34 commits into
base: main
Choose a base branch
from

Conversation

autoantwort
Copy link
Contributor

@autoantwort autoantwort commented Feb 17, 2022

Fixes microsoft/vcpkg#20549 (comment), microsoft/vcpkg#14025 and microsoft/vcpkg#29857.

The noop case now needs 110 ms instead of 2.2 seconds on my arm mac.
And from 2.0 sec to 18 ms when build as release

include/vcpkg/vcpkgpaths.h Outdated Show resolved Hide resolved
src/vcpkg/install.cpp Outdated Show resolved Hide resolved
src/vcpkg/install.cpp Outdated Show resolved Hide resolved
@ras0219-msft
Copy link
Contributor

ras0219-msft commented Feb 23, 2022

A few independent thoughts:

MSBuild parity

I think we should consider changing our CMake integration to use the same algorithm as MSBuild. Currently, MSBuild considers:

  1. Timestamp check on vcpkg.json
  2. Timestamp check on vcpkg-configuration.json
  3. Triplet name
  4. Host triplet name

We should add "full vcpkg command line" to that list in both cases. My concern would be only that this will not pick up changes to overlay port contents or to the triplet files.

Locking

Right now, vcpkg generally performs a bunch of file locking in manifest mode that will cause this "lightweight" check to be globally serialized. For CMake usage this is less important since it's often called only once for the whole repo, but for more complex scenarios and other buildsystems it could be problematic.

Avoid hashing file contents

If the intent of this is to be a lightweight check, we should avoid hashing file contents altogether. Instead, I would suggest:

  1. For every file under consideration, compare their modified timestamp to the stamp's modified timestamp.
  2. If the timestamps disagree (a file is newer than the stamp), report a mismatch.
  3. For every file under consideration, hash the path. This ensures deleting files will be tracked correctly. (In practice I think this is easiest by hashing the list of paths)
  4. Read the stamp contents and compare against the paths hash. If not matching, report a mismatch.

If there's a mismatch, perform the install. Once the install completes, write the new paths hash to the stamp (which updates the timestamp).

We could potentially go even further here by instead having vcpkg actually report out the set of files considered as part of the build, a la /showIncludes/-M -MF. This would enable buildsystems to perform most of this check on their own without spawning a vcpkg process at all (very good for MSBuild!). One caveat with this would be correctly handling our "consider all files in this folder" cases.

@autoantwort
Copy link
Contributor Author

My concern would be only that this will not pick up changes to overlay port contents or to the triplet files.

Yeah I think we should also check the timestamp of the triplet files and the overlay port files.

Right now, vcpkg generally performs a bunch of file locking in manifest mode that will cause this "lightweight" check to be globally serialized.

For msbuild this is a good thing if we still use the builtin msbuild change detection. If you for example have 10 projects in your solution and a vcpkg related change is detected, vcpkg runs 10 times in parallel. If the check is done after the lock, the first vcpkg instance will install the dependencies and then all the others will detect that there are no changes and will skip installation. If we would do the check before the locking, every vcpkg instance will detect a change and tries to install everything.
If we don't want to use the builtin msbuild check, we optimally need two checks. One before the file locking and one after. But imho for now it is enough to do the check after the locking.

For every file under consideration, hash the path. This ensures deleting files will be tracked correctly. (In practice I think this is easiest by hashing the list of paths)

We can simply check the modified date of the folders. The modified date of a folder is updated if a file is deleted, added or renamed.

Otherwise sounds good :)

@autoantwort
Copy link
Contributor Author

@ras0219-msft Are you happy with the new changes? :)

@ras0219-msft ras0219-msft self-requested a review May 10, 2022 19:58
# Conflicts:
#	include/vcpkg/commands.setinstalled.h
#	locales/messages.en.json
#	locales/messages.json
#	src/vcpkg/commands.setinstalled.cpp
#	src/vcpkg/install.cpp
# Conflicts:
#	locales/messages.en.json
#	locales/messages.json
# Conflicts:
#	src/vcpkg/install.cpp
@zig-for
Copy link

zig-for commented Jul 6, 2022

It would be really neat if this were merged, this is really slowing down my builds.

# Conflicts:
#	include/vcpkg/commands.setinstalled.h
#	locales/messages.en.json
#	locales/messages.json
#	src/vcpkg/base/strings.cpp
#	src/vcpkg/commands.setinstalled.cpp
#	src/vcpkg/install.cpp
# Conflicts:
#	include/vcpkg/commands.setinstalled.h
#	locales/messages.en.json
#	locales/messages.json
#	src/vcpkg/commands.porthistory.cpp
#	src/vcpkg/commands.setinstalled.cpp
#	src/vcpkg/install.cpp
@autoantwort
Copy link
Contributor Author

@BillyONeal Any news? :)

@BillyONeal
Copy link
Member

I think for some reason I had in my mind that someone else specific was to review this one. Sorry, I don't have updated news for you.

@autoantwort
Copy link
Contributor Author

Thank you for the update :) Maybe you can poke the person.
This change would mean a big improvement in usability for cmake users (100 times faster configure runs).

@BillyONeal
Copy link
Member

Thank you for the update :) Maybe you can poke the person.

To clarify, I had in my mind that someone was supposed to look at this but that doesn't appear to be the case

@autoantwort
Copy link
Contributor Author

According to https://discord.com/channels/400588936151433218/687365466422902841/1038192801491464192 it will probably not be merged and they will not look at it before march

@ras0219-msft ras0219-msft self-assigned this Feb 1, 2023
# Conflicts:
#	src/vcpkg/base/json.cpp
# Conflicts:
#	include/vcpkg/base/messages.h
#	src/vcpkg/base/messages.cpp
@dchansen
Copy link

dchansen commented Oct 5, 2023

Any news on this PR?

@autoantwort
Copy link
Contributor Author

autoantwort commented Oct 8, 2023

No. There are concerns about cache invalidation. For example if you do the following with this PR:

  1. vcpkg install --cache-compiler-information
  2. Update your compiler
  3. vcpkg install --cache-compiler-information # This understands that the compiler has been updated and reinstalled
  4. export CXX=<path_to_different_compiler>
  5. vcpkg install --cache-compiler-information # This would not do anything, since the cached compiler was not updated
  6. Change vcpkg.json
  7. Run vcpkg install. This would use the compiler abi from 3., but the compiler from 4. ⚡

So a solution could be to only use --cache-compiler-information to detect changes. If there are changes, rerun compiler detection and install packages. This would fix point 7.

Another solution is to cache the compiler and use it later (For example by sharing the CMakeCache.txt). This would use the compiler from point 3 in point 7.

Another more simple solution would be to only check if the vcpkg.json file was changed. The point 3 would do nothing and point 7 would work normally.

@autoantwort autoantwort changed the title manifest install: Cache compiler information manifest install: Cache compiler information for change detection Oct 17, 2023
# Conflicts:
#	src/vcpkg/commands.build.cpp
@@ -666,14 +783,60 @@ namespace vcpkg
const PreBuildInfo& pre_build_info,
const Toolset& toolset)
{
static constexpr auto vcpkg_json = R"--(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have microsoft/vcpkg#36398 do you mind if I replace this attempt to move detect_compiler out with just using what @data-queue added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do it :)

@BillyONeal
Copy link
Member

@BillyONeal

I've convinced myself we should take this because changing the compiler under an installed tree is already doomed.

@ras0219-msft (paraphrased)

The thing I am concerned about is corrupting binary caches. If the installed tree is toast you can recover by deleting it and reinstalling, but if binary caches are toast recovering is difficult.

If the meat of this is saying the hash algorithm is too slow, we don't need to skip the cmake detection to cache the hash. We can have a database file saying "this path was this filesize / last mod time / sha".

We could also hash the first MB of the compiler or something like that.

@BillyONeal

OK, that makes sense to me. What if we kept the compiler from the metadata cache and used that to lock the version?

@ras0219-msft (paraphrased)

That would resolve my concerns.

@PatrickKa
Copy link

Any update/news on this?

@BillyONeal
Copy link
Member

BillyONeal commented Sep 25, 2024

Any update/news on this?

Given that the discussion in #362 (comment) isn't resolved I don't see progress happening here.

I did some more testing and it looks like the majority of the time is not SHA1-ing the compiler binaries, but is instead CMake detecting which compiler to use. Note that I intentionally used clang in this example, which is a 116MB binary, rather than cl.exe which is like 3MB:

PS D:\vcpkg> Measure-Command { cmake.exe -DCURRENT_PORT_DIR=D:\vcpkg\scripts\detect_compiler -DCURRENT_BUILDTREES_DIR=D:\vcpkg\buildtrees -DCURRENT_PACKAGES_DIR=D:\vcpkg\packages\detect_compiler_x64-windows -D_HOST_TRIPLET=x64-windows -DCMD=BUILD -DDOWNLOADS=D:\vcpkg-downloads -DTARGET_TRIPLET=x64-windows -DTARGET_TRIPLET_FILE=D:\vcpkg\triplets\x64-windows.cmake -DVCPKG_ROOT_DIR=D:\vcpkg -DPACKAGES_DIR=D:\vcpkg\packages -DBUILDTREES_DIR=D:\vcpkg\buildtrees -D_VCPKG_INSTALLED_DIR=D:\vcpkg\installed -DVCPKG_MANIFEST_INSTALL=OFF -DVCPKG_BASE_VERSION=2999-12-12 -DDO_HASH=OFF -P .\scripts\ports.cmake }
[1/1] "C:/Program Files/CMake/bin/cmake.exe" -E chdir ".." "C:/Program Files/CMake/bin/cmake.exe" "D:/vcpkg/scripts/detect_compiler" "-G" "Ninja" "-DCMAKE_BUILD_TYPE=Release" "-DCMAKE_INSTALL_PREFIX=D:\vcpkg\packages\detect_compiler_x64-windows" "-DDO_HASH=OFF" "-DCMAKE_MAKE_PROGRAM=D:/vcpkg-downloads/tools/ninja/1.10.2-windows/ninja.exe" "-DBUILD_SHARED_LIBS=ON" "-DVCPKG_CHAINLOAD_TOOLCHAIN_FILE=D:/vcpkg/scripts/toolchains/windows.cmake" "-DVCPKG_TARGET_TRIPLET=x64-windows" "-DVCPKG_SET_CHARSET_FLAG=ON" "-DVCPKG_PLATFORM_TOOLSET=" "-DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON" "-DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON" "-DCMAKE_FIND_PACKAGE_NO_SYSTEM_PACKAGE_REGISTRY=ON" "-DCMAKE_INSTALL_SYSTEM_RUNTIME_LIBS_SKIP=TRUE" "-DCMAKE_VERBOSE_MAKEFILE=ON" "-DVCPKG_APPLOCAL_DEPS=OFF" "-DCMAKE_TOOLCHAIN_FILE=D:/vcpkg/scripts/buildsystems/vcpkg.cmake" "-DCMAKE_ERROR_ON_ABSOLUTE_INSTALL_DESTINATION=ON" "-DVCPKG_CXX_FLAGS=" "-DVCPKG_CXX_FLAGS_RELEASE=" "-DVCPKG_CXX_FLAGS_DEBUG=" "-DVCPKG_C_FLAGS=" "-DVCPKG_C_FLAGS_RELEASE=" "-DVCPKG_C_FLAGS_DEBUG=" "-DVCPKG_CRT_LINKAGE=dynamic" "-DVCPKG_LINKER_FLAGS=" "-DVCPKG_LINKER_FLAGS_RELEASE=" "-DVCPKG_LINKER_FLAGS_DEBUG=" "-DVCPKG_TARGET_ARCHITECTURE=x64" "-DCMAKE_INSTALL_LIBDIR:STRING=lib" "-DCMAKE_INSTALL_BINDIR:STRING=bin" "-D_VCPKG_ROOT_DIR=D:\vcpkg" "-DZ_VCPKG_ROOT_DIR=D:\vcpkg" "-D_VCPKG_INSTALLED_DIR=D:\vcpkg\installed" "-DVCPKG_MANIFEST_INSTALL=OFF"
-- The C compiler identification is Clang 18.1.6 with GNU-like command-line
-- The CXX compiler identification is Clang 18.1.6 with GNU-like command-line
#COMPILER_HASH#
#COMPILER_C_HASH#
#COMPILER_C_VERSION#18.1.6
#COMPILER_C_ID#Clang
#COMPILER_C_PATH#C:/Program Files/LLVM/bin/clang.exe
#COMPILER_CXX_HASH#
#COMPILER_CXX_VERSION#18.1.6
#COMPILER_CXX_ID#Clang
#COMPILER_CXX_PATH#C:/Program Files/LLVM/bin/clang++.exe
-- Configuring done (1.2s)
-- Generating done (0.0s)
CMake Warning:
  Manually-specified variables were not used by the project:

    BUILD_SHARED_LIBS
    CMAKE_INSTALL_BINDIR
    CMAKE_INSTALL_LIBDIR
    _VCPKG_ROOT_DIR


-- Build files have been written to: D:/vcpkg/buildtrees/x64-windows-rel


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 1
Milliseconds      : 332
Ticks             : 13323311
TotalDays         : 1.54204988425926E-05
TotalHours        : 0.000370091972222222
TotalMinutes      : 0.0222055183333333
TotalSeconds      : 1.3323311
TotalMilliseconds : 1332.3311

PS D:\vcpkg> Measure-Command { cmake.exe -DCURRENT_PORT_DIR=D:\vcpkg\scripts\detect_compiler -DCURRENT_BUILDTREES_DIR=D:\vcpkg\buildtrees -DCURRENT_PACKAGES_DIR=D:\vcpkg\packages\detect_compiler_x64-windows -D_HOST_TRIPLET=x64-windows -DCMD=BUILD -DDOWNLOADS=D:\vcpkg-downloads -DTARGET_TRIPLET=x64-windows -DTARGET_TRIPLET_FILE=D:\vcpkg\triplets\x64-windows.cmake -DVCPKG_ROOT_DIR=D:\vcpkg -DPACKAGES_DIR=D:\vcpkg\packages -DBUILDTREES_DIR=D:\vcpkg\buildtrees -D_VCPKG_INSTALLED_DIR=D:\vcpkg\installed -DVCPKG_MANIFEST_INSTALL=OFF -DVCPKG_BASE_VERSION=2999-12-12 -DDO_HASH=ON -P .\scripts\ports.cmake }
[1/1] "C:/Program Files/CMake/bin/cmake.exe" -E chdir ".." "C:/Program Files/CMake/bin/cmake.exe" "D:/vcpkg/scripts/detect_compiler" "-G" "Ninja" "-DCMAKE_BUILD_TYPE=Release" "-DCMAKE_INSTALL_PREFIX=D:\vcpkg\packages\detect_compiler_x64-windows" "-DDO_HASH=ON" "-DCMAKE_MAKE_PROGRAM=D:/vcpkg-downloads/tools/ninja/1.10.2-windows/ninja.exe" "-DBUILD_SHARED_LIBS=ON" "-DVCPKG_CHAINLOAD_TOOLCHAIN_FILE=D:/vcpkg/scripts/toolchains/windows.cmake" "-DVCPKG_TARGET_TRIPLET=x64-windows" "-DVCPKG_SET_CHARSET_FLAG=ON" "-DVCPKG_PLATFORM_TOOLSET=" "-DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON" "-DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON" "-DCMAKE_FIND_PACKAGE_NO_SYSTEM_PACKAGE_REGISTRY=ON" "-DCMAKE_INSTALL_SYSTEM_RUNTIME_LIBS_SKIP=TRUE" "-DCMAKE_VERBOSE_MAKEFILE=ON" "-DVCPKG_APPLOCAL_DEPS=OFF" "-DCMAKE_TOOLCHAIN_FILE=D:/vcpkg/scripts/buildsystems/vcpkg.cmake" "-DCMAKE_ERROR_ON_ABSOLUTE_INSTALL_DESTINATION=ON" "-DVCPKG_CXX_FLAGS=" "-DVCPKG_CXX_FLAGS_RELEASE=" "-DVCPKG_CXX_FLAGS_DEBUG=" "-DVCPKG_C_FLAGS=" "-DVCPKG_C_FLAGS_RELEASE=" "-DVCPKG_C_FLAGS_DEBUG=" "-DVCPKG_CRT_LINKAGE=dynamic" "-DVCPKG_LINKER_FLAGS=" "-DVCPKG_LINKER_FLAGS_RELEASE=" "-DVCPKG_LINKER_FLAGS_DEBUG=" "-DVCPKG_TARGET_ARCHITECTURE=x64" "-DCMAKE_INSTALL_LIBDIR:STRING=lib" "-DCMAKE_INSTALL_BINDIR:STRING=bin" "-D_VCPKG_ROOT_DIR=D:\vcpkg" "-DZ_VCPKG_ROOT_DIR=D:\vcpkg" "-D_VCPKG_INSTALLED_DIR=D:\vcpkg\installed" "-DVCPKG_MANIFEST_INSTALL=OFF"
-- The C compiler identification is Clang 18.1.6 with GNU-like command-line
-- The CXX compiler identification is Clang 18.1.6 with GNU-like command-line
#COMPILER_HASH#0f4a11f3a8b3c6eee0390888b2ca585daca373eb
#COMPILER_C_HASH#cf5d8e7225d21ffe029bdf032edaaf8a66ea0594
#COMPILER_C_VERSION#18.1.6
#COMPILER_C_ID#Clang
#COMPILER_C_PATH#C:/Program Files/LLVM/bin/clang.exe
#COMPILER_CXX_HASH#cf5d8e7225d21ffe029bdf032edaaf8a66ea0594
#COMPILER_CXX_VERSION#18.1.6
#COMPILER_CXX_ID#Clang
#COMPILER_CXX_PATH#C:/Program Files/LLVM/bin/clang++.exe
-- Configuring done (1.6s)
-- Generating done (0.0s)
CMake Warning:
  Manually-specified variables were not used by the project:

    BUILD_SHARED_LIBS
    CMAKE_INSTALL_BINDIR
    CMAKE_INSTALL_LIBDIR
    _VCPKG_ROOT_DIR


-- Build files have been written to: D:/vcpkg/buildtrees/x64-windows-rel


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 1
Milliseconds      : 753
Ticks             : 17538336
TotalDays         : 2.0299E-05
TotalHours        : 0.000487176
TotalMinutes      : 0.02923056
TotalSeconds      : 1.7538336
TotalMilliseconds : 1753.8336

PS D:\vcpkg>

So the hashing is approximately (1.33-1.75)/1.33 ~= -31% of the time. Not nothing, but not the earthshattering difference in the original PR description.

Given that @ras0219-msft does not want to accept any change that skips the CMake compiler detection over the potential to damage binary caches, I am unsure what changes could be made to get this PR into a state we could merge.

@BillyONeal
Copy link
Member

To clarify, the problematic scenario is:

  1. Run vcpkg with this setting turned on.
  2. Change something about the environment that causes a different compiler to be selected.
  3. Run vcpkg with this setting turned on again. It sees that the compiler that was detected before has not changed, so it skips running the compiler detection step and uses the old compiler's SHAs.
  4. Binary caching uploads bits built with a different compiler as if they were built with the old compiler.
  5. Future binary cache hits with the old compiler are now poisoned by the new compiler.

so we would have to have high confidence that anything that could possibly change which compiler CMake could pick invalidated this detection, and it seems unlikely to me that we will ever reach that level of confidence given how lots of how CMake does this is a black box.

@autoantwort
Copy link
Contributor Author

autoantwort commented Sep 25, 2024

So the hashing is approximately (1.33-1.75)/1.33 ~= -31% of the time. Not nothing, but not the earthshattering difference in the original PR description.
I have tested it on my mac:
Baseline noop: 1.4s
Baseline noop without hashing: 0.5s
This PR noop: ~0.1s

Given that @ras0219-msft does not want to accept any change that skips the CMake compiler detection over the potential to damage binary caches, I am unsure what changes could be made to get this PR into a state we could merge.

I already sketched a solution but got no real response. If this is the only thing blocking this PR I will start implementing that :)

@BillyONeal
Copy link
Member

I already sketched a solution but got no real response.

Sorry I missed that, can you point specifically?

@autoantwort
Copy link
Contributor Author

#362 (comment)
Especially:

So a solution could be to only use --cache-compiler-information to detect changes. If there are changes, rerun compiler detection and install packages. This would fix point 7.

@BillyONeal
Copy link
Member

#362 (comment) Especially:

So a solution could be to only use --cache-compiler-information to detect changes. If there are changes, rerun compiler detection and install packages. This would fix point 7.

I'm not sure how such changes are going to be possible in a way that truly sells that it's safe.

@BillyONeal
Copy link
Member

If the meat of this is saying the hash algorithm is too slow, we don't need to skip the cmake detection to cache the hash. We can have a database file saying "this path was this filesize / last mod time / sha".

Would you be willing to do just this part? It would fix your Mac test and is clearly less controversial..

# Conflicts:
#	include/vcpkg/commands.build.h
#	include/vcpkg/installedpaths.h
#	src/vcpkg/commands.build.cpp
Comment on lines +254 to +271
if (used_cached_compiler_info)
{
if (!is_action_plan_fulfilled(action_plan, status_db))
{
// we have to install something, so get fresh compiler infos
for (auto& install_action : action_plan.install_actions)
{
install_action.abi_info.clear();
}
compute_all_abis(paths, action_plan, cmake_vars, {});
used_cached_compiler_info = false;
}
}
adjust_action_plan_to_status_db(action_plan, status_db);
if (used_cached_compiler_info)
{
Checks::check_exit(VCPKG_LINE_INFO, action_plan.empty());
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the cached compiler info is only used to detect if there is something to install, if something gets installed, fresh compiler info are used.
(if you update your compiler the cache is not used)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants