system_binary "test binary" caches failures across pantsd invocations (and doesn't work with `shim` scripts) #21744

trhodeos · 2024-12-11T18:21:51Z

Describe the bug
We use julia via juliaup (similar to pyenv, but for Julia). We have the following system_binary declaration:

system_binary(
    name="julia",
    binary_name="julia",
    fingerprint=r".*1.10.*",
    fingerprint_args=["-v"],
)

Caching this makes sense for "static" binaries. However, juliaup installs a shim julia script which uses the default juliaup julia version. Thus, the test binary step gets cached since the "inputs" (aka the shim script) havent changed.

This means that if the Test Binary rule breaks (or passes) once, it'll always be that value until the cache is deleted.

Pants version
2.23.0

OS
MacOS (but most likely happens on both)

Additional info
Here's a repro case. Note that you need juliaup installed. You should toggle between juliaup default 1.10 and juliaup default 1.9 to see the failure modes.

The text was updated successfully, but these errors were encountered:

huonw · 2024-12-17T06:21:04Z

Sorry for the trouble. Thanks for tracking down that code.

There's a bit more context/info in the called function's docstring:

pants/src/python/pants/core/util_rules/environments.py

Lines 540 to 573 in 9036f8c

    
               def executable_search_path_cache_scope( 
        
                   self, *, cache_failures: bool = False 
        
               ) -> ProcessCacheScope: 
        
                   """Whether it's safe to cache executable search path discovery or not. 
        
                   If the environment may change on us, e.g. the user upgrades a binary, then it's not safe to 
        
                   cache the discovery to disk. Technically, in that case, we should recheck the binary every 
        
                   session (i.e. Pants run), but we instead settle for every Pantsd lifetime to have more 
        
                   acceptable performance. 
        
                   Meanwhile, when running with Docker, we already invalidate whenever the image changes 
        
                   thanks to https://github.com/pantsbuild/pants/pull/17101. 
        
                   Remote execution often is safe to cache, but depends on the remote execution server. 
        
                   So, we rely on the user telling us what is safe. 
        
                   """ 
        
                   caching_allowed = self.val and ( 
        
                       self.val.has_field(DockerImageField) 
        
                       or ( 
        
                           self.val.has_field(RemoteEnvironmentCacheBinaryDiscovery) 
        
                           and self.val[RemoteEnvironmentCacheBinaryDiscovery].value 
        
                       ) 
        
                   ) 
        
                   if cache_failures: 
        
                       return ( 
        
                           ProcessCacheScope.ALWAYS 
        
                           if caching_allowed 
        
                           else ProcessCacheScope.PER_RESTART_ALWAYS 
        
                       ) 
        
                   return ( 
        
                       ProcessCacheScope.SUCCESSFUL 
        
                       if caching_allowed 
        
                       else ProcessCacheScope.PER_RESTART_SUCCESSFUL 
        
                   )

In particular, when running locally, this is only cached "per restart", i.e. for a pantsd session. Thus, getting Pantsd to restart (rm -rf .pants.d/pids is one way) will cause the binary search to rerun and find the new one.

This ends up being a performance/"correctness" trade-off: no caching makes for a slower experience.

Does the pantsd tip help?

Maybe we could even improve the error message to tell people about this, since it's easy to trip over:

pants/src/python/pants/backend/adhoc/run_system_binary.py

Lines 63 to 155 in e82a4ff

    
           async def _find_binary( 
        
               address: Address, 
        
               binary_name: str, 
        
               search_path: SearchPath, 
        
               fingerprint_pattern: str | None, 
        
               fingerprint_args: tuple[str, ...] | None, 
        
               fingerprint_dependencies: tuple[str, ...] | None, 
        
               log_fingerprinting_errors: bool, 
        
           ) -> BinaryPath: 
        
               binaries = await Get( 
        
                   BinaryPaths, 
        
                   BinaryPathRequest( 
        
                       binary_name=binary_name, 
        
                       search_path=search_path, 
        
                   ), 
        
               ) 
        
               fingerprint_args = fingerprint_args or () 
        
               deps = await Get( 
        
                   ResolvedExecutionDependencies, 
        
                   ResolveExecutionDependenciesRequest(address, (), fingerprint_dependencies), 
        
               ) 
        
               rds = deps.runnable_dependencies 
        
               env: dict[str, str] = {} 
        
               append_only_caches: Mapping[str, str] = {} 
        
               immutable_input_digests: Mapping[str, Digest] = {} 
        
               if rds: 
        
                   env = {"PATH": rds.path_component} 
        
                   env.update(**(rds.extra_env or {})) 
        
                   append_only_caches = rds.append_only_caches 
        
                   immutable_input_digests = rds.immutable_input_digests 
        
               tests: tuple[FallibleProcessResult, ...] = await MultiGet( 
        
                   Get( 
        
                       FallibleProcessResult, 
        
                       Process( 
        
                           description=f"Testing candidate for `{binary_name}` at `{path.path}`", 
        
                           argv=(path.path,) + fingerprint_args, 
        
                           input_digest=deps.digest, 
        
                           env=env, 
        
                           append_only_caches=append_only_caches, 
        
                           immutable_input_digests=immutable_input_digests, 
        
                       ), 
        
                   ) 
        
                   for path in binaries.paths 
        
               ) 
        
               for test, binary in zip(tests, binaries.paths): 
        
                   if test.exit_code != 0: 
        
                       if log_fingerprinting_errors: 
        
                           logger.warning( 
        
                               f"Error occurred while fingerprinting candidate binary `{binary.path}` " 
        
                               f"for binary `{binary_name}` (exit code {test.exit_code}) (use the `{SystemBinaryLogFingerprintingErrorsField.alias}` field to control this warning):\n\n" 
        
                               f"stdout:\n{test.stdout.decode(errors='ignore')}\n" 
        
                               f"stderr:\n{test.stderr.decode(errors='ignore')}" 
        
                           ) 
        
                       # Skip this binary since fingerprinting failed. 
        
                       continue 
        
                   if fingerprint_pattern: 
        
                       fingerprint = test.stdout.decode().strip() 
        
                       match = re.match(fingerprint_pattern, fingerprint) 
        
                       if not match: 
        
                           continue 
        
                   return binary 
        
               message = StringIO() 
        
               message.write(f"Could not find a binary with name `{binary_name}`") 
        
               if fingerprint_pattern: 
        
                   message.write( 
        
                       f" with output matching `{fingerprint_pattern}` when run with arguments `{' '.join(fingerprint_args or ())}`" 
        
                   ) 
        
               message.write(". The following paths were searched:\n") 
        
               for sp in search_path: 
        
                   message.write(f"- {sp}\n") 
        
               failed_tests = [ 
        
                   (test, binary) for test, binary in zip(tests, binaries.paths) if test.exit_code != 0 
        
               ] 
        
               if failed_tests: 
        
                   message.write( 
        
                       "\n\nThe following binaries were skipped because each binary returned an error when invoked:" 
        
                   ) 
        
                   for failed_test, failed_binary in failed_tests: 
        
                       message.write(f"\n\n- {failed_binary.path} (exit code {failed_test.exit_code})\n") 
        
                       message.write(f"  stdout:\n{failed_test.stdout.decode(errors='ignore')}\n") 
        
                       message.write(f"  stderr:\n{failed_test.stderr.decode(errors='ignore')}\n") 
        
               raise ValueError(message.getvalue())

trhodeos · 2024-12-17T15:42:50Z

It seemed to be "sticky" for me (aka persisted after restarting pantsd).. Let me double check that though, I may be misspeaking.

Yeah, I'm not sure what the answer is here. For this specific case, there's a config file that controls the "live binary".. It'd be nice if we could add that config file (in $HOME/.julia/juliaup/juliaup.json in this case) to the fingerprint of system_binary, but I realize that comes with it's own set of tradeoffs (fingerprinting files outside of the "pants" realm)

tdyas · 2024-12-26T14:22:21Z

An alternative solution would be to add an attribute to the system_binary target type to make caching "per session" so every Pants invocation reevaluates the binary path search. "Per session" is one of the ProcessCacheScope variants.

We already have a similar idea for shell_command and adhoc_tool target types coming in v2.24.x via a new cache_scope field; for example, https://www.pantsbuild.org/prerelease/reference/targets/adhoc_tool#cache_scope.

tdyas · 2024-12-26T23:44:01Z

Yeah, I'm not sure what the answer is here. For this specific case, there's a config file that controls the "live binary".. It'd be nice if we could add that config file (in $HOME/.julia/juliaup/juliaup.json in this case) to the fingerprint of system_binary, but I realize that comes with it's own set of tradeoffs (fingerprinting files outside of the "pants" realm)

This is feasible to implement in local execution on Pants v2.23.x+ because the rules engine now has the ability to retrieve metadata of system paths outside of the repository (via the PathMetadataRequest / PathMetadataResult types).

trhodeos added the bug label Dec 11, 2024

trhodeos changed the title ~~system_binary "test binary" caches failures across pantsd invocations~~ system_binary "test binary" caches failures across pantsd invocations (and doesn't work with shim scripts) Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

system_binary "test binary" caches failures across pantsd invocations (and doesn't work with `shim` scripts) #21744

system_binary "test binary" caches failures across pantsd invocations (and doesn't work with `shim` scripts) #21744

trhodeos commented Dec 11, 2024 •

edited

Loading

huonw commented Dec 17, 2024

trhodeos commented Dec 17, 2024 •

edited

Loading

tdyas commented Dec 26, 2024

tdyas commented Dec 26, 2024 •

edited

Loading

system_binary "test binary" caches failures across pantsd invocations (and doesn't work with shim scripts) #21744

system_binary "test binary" caches failures across pantsd invocations (and doesn't work with shim scripts) #21744

Comments

trhodeos commented Dec 11, 2024 • edited Loading

huonw commented Dec 17, 2024

trhodeos commented Dec 17, 2024 • edited Loading

tdyas commented Dec 26, 2024

tdyas commented Dec 26, 2024 • edited Loading

system_binary "test binary" caches failures across pantsd invocations (and doesn't work with `shim` scripts) #21744

system_binary "test binary" caches failures across pantsd invocations (and doesn't work with `shim` scripts) #21744

trhodeos commented Dec 11, 2024 •

edited

Loading

trhodeos commented Dec 17, 2024 •

edited

Loading

tdyas commented Dec 26, 2024 •

edited

Loading