-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
macOS nightly wheel builds failing since 2024-11-19 #7019
Comments
Inspection of PRs landed between the last good build and first bad build suggested the following:
Trial revert of #6837 in #7013 still failed the job; trialing revert of the other two PRs together |
I reran the last good workflow run; builds succeeded (there were some failures due to an unrelated issue). |
Found a failure with the same error message in a different job ( |
that job is green on trunk runs though! https://hud.pytorch.org/hud/pytorch/executorch/main/1?per_page=50&name_filter=llama-runner-mac%20(fp32%2C%20mps |
am late to this so not sure my comments will help, but any change related to xnnpack upgrade? since the job fails related xnnpack |
@larryliu0820 suggested maybe the runner toolchain changed. It looks like we're using macos-m1-stable runners for test-llama-runner-mac: https://github.com/pytorch/executorch/blob/main/.github/workflows/trunk.yml#L236 not sure what runner the wheel build uses I don't know a whole lot about this runner type, but I see that 1) it seems to be in-house: pytorch/pytorch#127490 2) I don't see recent activity in https://github.com/pytorch-labs/pytorch-gha-infra/ suggesting that there was a recent update |
as I mentioned above, I inspected all the commits (there aren't many) in the range of commit hashes flagged in the nightly builds. |
An example of trunk job passing: https://github.com/pytorch/executorch/actions/runs/11962683652/job/33351640398 An example of PR job failing: https://github.com/pytorch/executorch/actions/runs/11959891658/job/33342745520?pr=7010 I don't see obvious difference between these 2, regarding environment setup. @huydhn anything obvious to you? |
Another example: PR jobs failing on #7044; tbd if they fail consistently |
interesting that a large block of jobs all failed on the same PR. Points to some piece of shared state being the cause, either the repo state itself or sccache |
I am now able to repro! |
reverting backends/xnnpack/third-party/XNNPACK to ad0e62d69815946be92134a56ed3ff688e2549e8 (updated in #6101) does not fix it |
removing |
just reconfirmed that |
sscache uses the file path and the compiler name and its flags in the cache. So, there shouldn't be any issue from 0.8.2 update on ubuntu as they are well isolated. |
There were recent xnnpack update in PyTorch, if ET directly depends on XNNPack, but its version is older, it can easily create a problem, as MacOS, unlike Linux does not have |
for clarity, the update is pytorch/pytorch#139913 and landed on 11/18, the day before nightlies started failing, so it's very suspicious. I've asked @digantdesai / @mcr229 about this internally; tagging them here as well for visibility. |
I think #6538 doesn't fix the issue as it's still showing up on the latest nightly with the change in place https://github.com/pytorch/executorch/actions/runs/12060458350/job/33630916538. I should have add |
Should we revisit commits between 11/18 nightly and 11/19 nightly? 8526d0a...04f6fcd |
Repro steps:
|
closing as it was fixed |
🐛 Describe the bug
Status page: https://github.com/pytorch/executorch/actions/workflows/build-wheels-m1.yml
Note that the Python 3.9 build always fails, so even though the runs are red, they were successful through 2024-11-18.
Linking is failing with
ld: invalid use of ADRP in '_init_f32_vcopysign_config' to '_xnn_f32_vcopysign_ukernel__neon_u8’
.Versions
N/A
The text was updated successfully, but these errors were encountered: