Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RBE failed when build cmake target in custom image #8154

Closed
jingshi-ant opened this issue Jan 10, 2025 · 6 comments
Closed

RBE failed when build cmake target in custom image #8154

jingshi-ant opened this issue Jan 10, 2025 · 6 comments

Comments

@jingshi-ant
Copy link

Describe the bug
Does RBE support cmake target ?
when I tried the RBE, I met error below:
ERROR: /home/xx/.cache/bazel/_bazel_xx/b9cc035dadc1c468737aa42e804f91a0/external/com_github_utf8proc/BUILD.bazel:24:6: Foreign Cc - CMake: Building utf8proc failed: (Exit 34): Invalid action cache entry 2e54f202075beea79cd610ed360711f07e4f6d7b931ba95f446b5d7db9306ebb: expected output external/com_github_utf8proc/utf8proc/lib/libutf8proc.a does not exist.
Target @com_github_utf8proc//:utf8proc failed to build

To Reproduce
Steps to reproduce the behavior:

  1. enable RBE:
build --remote_executor=grpcs://remote.buildbuddy.io
build --host_platform=//:docker_image_platform
build --platforms=//:docker_image_platform
build --extra_execution_platforms=//:docker_image_platform
platform(
    name = "docker_image_platform",
    constraint_values = [
        "@platforms//cpu:x86_64",
        "@platforms//os:linux",
        "@bazel_tools//tools/cpp:clang",
    ],
    exec_properties = {
        "OSFamily": "Linux",
        "dockerNetwork": "off",
        "container-image": "docker://secretflow/scql-ci:latest",
    },
)

2 build target in cmake:

load("@rules_foreign_cc//foreign_cc:defs.bzl", "cmake")

package(default_visibility = ["//visibility:public"])

filegroup(
    name = "all_srcs",
    srcs = glob(["**"]),
)

cmake(
    name = "utf8proc",
    cache_entries = {
        "BUILD_STATIC_LIBS": "ON",
    },
    lib_source = ":all_srcs",
    out_static_libs = [
        "libutf8proc.a",
    ],
)
  1. run bazel build:
    bazelisk build @com_github_utf8proc//:utf8proc -c opt

Expected behavior

the output of cmake target in bazel should contain libutf8proc.a but empty.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@sluongng
Copy link
Contributor

Hey @jingshi-ant, if you can share the invocation URL here or in our Slack, I can help you troubleshoot the problem much easier.

Invalid action cache entry 2e54f202075beea79cd610ed360711f07e4f6d7b931ba95f446b5d7db9306ebb: expected output external/com_github_utf8proc/utf8proc/lib/libutf8proc.a does not exist.

This means that your build action ran remotely and exited with code zero(0), which indicated successful execution. However, it did not produce the output that Bazel expected, libutf8proc.a.

What is worth noting is the Remote Cache is now most likely to contain the invalid ActionResult, which has no output. So to avoid the old Remote Cache entry, you would want to change the remote cache namespace with --remote_instance_name=10-01-2025. The flag value will give you a new remote cache namespace and ignore the previous "wrong" result.

@jingshi-ant
Copy link
Author

Thanks for your suggestions, I used the --remote_instance_name=10-01-2025, but the error is similar.
Is it possible that there is a problem with tool_chains and our mirror collaboration?
build --crosstool_top=@buildbuddy_toolchain//:toolchain
build --extra_toolchains=@buildbuddy_toolchain//:cc_toolchain
Call url: https://app.buildbuddy.io/invocation/1e605533-d8e5-4806-a7cb-ebbed5fd03a8

@jingshi-ant
Copy link
Author

If it helps, you can reproduce from branch: secretflow/scql#437
not only cmake, the common cc_library also failed when used custom image:

bazelisk build @aws_c_common -c opt --remote_instance_name=10-01-2025
INFO: Invocation ID: 14364f10-da2f-4031-8227-11c0fd0c7b0c
INFO: Streaming build results to: https://app.buildbuddy.io/invocation/14364f10-da2f-4031-8227-11c0fd0c7b0c
INFO: Analyzed target @aws_c_common//:aws_c_common (4 packages loaded, 113 targets configured).
INFO: Found 1 target...
ERROR: /home/guojin/.cache/bazel/_bazel_guojin/b9cc035dadc1c468737aa42e804f91a0/external/aws_c_common/BUILD.bazel:50:8: Executing genrule @aws_c_common//:config_h failed: (Exit 34): Invalid action cache entry 475d12989f2d8b6f8d8b85353b77d6c123b11d73dbed0f761d941533603b4d4e: expected output external/aws_c_common/include/aws/common/config.h does not exist.
Target @aws_c_common//:aws_c_common failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 7.427s, Critical Path: 4.54s
INFO: 3 processes: 3 internal.
INFO: Streaming build results to: https://app.buildbuddy.io/invocation/14364f10-da2f-4031-8227-11c0fd0c7b0c
FAILED: Build did NOT complete successfully

@jingshi-ant
Copy link
Author

More: the custom image 'secretflow/scql-ci:latest' using '/home/admin' as work directory,
will it affect the bazel IO directory?

@sluongng
Copy link
Contributor

Take a look at this Execution in the Executions tab https://app.buildbuddy.io/invocation/1e605533-d8e5-4806-a7cb-ebbed5fd03a8?executionId=10-01-2025%2Fuploads%2F771922e1-84ae-4e8b-8c16-32bf6a93c81e%2Fblobs%2F2e5387d9087ee7259b0118cadadf5a44e072c30baee347a6a824660343a19596%2F244&actionDigest=2e5387d9087ee7259b0118cadadf5a44e072c30baee347a6a824660343a19596%2F244&executeResponseDigest=aef9815ed66fcca7f07a21854a1b1b6051789ef2e82d0b2e934b850199bfe167%2F130#action

You can see that we ran the wrapper_build_script.sh remotely and it exited with code zero.
However, none of the expected output files were created and the 2 output directories are empty.

I investigated a bit further and found that the container image that you are using here

    exec_properties = {
        "OSFamily": "Linux",
        "dockerNetwork": "off",
        "container-image": "docker://secretflow/scql-ci:latest",
    },

have an invalid ENTRYPOINT.
It's ["/bin/bash", "-lc"], which exit with /bin/bash: -c: option requires an argument.

This caused the action container to fail to start.

I tried rebuilding that action with this image

FROM secretflow/scql-ci@sha256:f2c775a0c1ab0cecb6648969d36a0cf988a244d00ed6a885a2fe6df0d1592e49

ENTRYPOINT []

and it worked.

This is partly a UI/document bug on our end for not properly validating ENTRYPOINT issues.
I will file a bug internally about this to improve how we handle this failure point.

cc: @bduffany

@jingshi-ant
Copy link
Author

It works, Thanks a lot~ (set ENTRYPOINT to [] in new image)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants