Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change some of CI to do ARM testing #47

Draft
wants to merge 36 commits into
base: main
Choose a base branch
from

Conversation

bmhowe23
Copy link
Collaborator

@bmhowe23 bmhowe23 commented Jan 8, 2025

Test-only for now

This reverts commit 4f08b6e.

Signed-off-by: Ben Howe <[email protected]>
Ref: https://github.com/quantumlib/Stim/blob/main/doc/usage_command_line.md

        When `--seed #` is set, the exact same simulation results will be
        produced every time ASSUMING:

        - the exact same other flags are specified
        - the exact same version of Stim is being used
        - the exact same machine architecture is being used (for example,
            you're not switching from a machine that has AVX2 instructions
            to one that doesn't).

Signed-off-by: Ben Howe <[email protected]>
Signed-off-by: Ben Howe <[email protected]>
Signed-off-by: Ben Howe <[email protected]>
Signed-off-by: Ben Howe <[email protected]>
This reverts commit eb720e5.
Signed-off-by: Ben Howe <[email protected]>
Signed-off-by: Ben Howe <[email protected]>
Signed-off-by: Ben Howe <[email protected]>
@bmhowe23
Copy link
Collaborator Author

The ARM segfault in the Solvers library (which in this case only appears on A100, not H100) appears to be the same as the known LLVM resolveAArch64Relocation issue. Here is the capture backtrace for posterity's sake. (The core file was captured on arm-a100 and then backtraced using a machine I could log into w/ H100.) A secondary issue we should figure out is why an LLVM assertion is appearing like a segfault in the Solver library because it would've made this much easier to assess if we had gotten the right error message out.

Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (threadid=281473227491008, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
[Current thread is 1 (Thread 0xffff97bd06c0 (LWP 2642))]
(gdb) bt
#0  __pthread_kill_implementation (threadid=281473227491008, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1  0x0000ffff9795f254 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x0000ffff9791a67c in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26
#3  <signal handler called>
#4  __pthread_kill_implementation (threadid=281473227491008, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#5  0x0000ffff9795f254 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#6  0x0000ffff9791a67c in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#7  0x0000ffff97907130 in __GI_abort () at ./stdlib/abort.c:79
#8  0x0000ffff97913fd0 in __assert_fail_base (fmt=0xffff97a2c550 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0xfffef21b6810 "isInt<33>(Result) && \"overflow check failed for relocation\"",
    file=file@entry=0xfffef21b5a90 "/llvm-project/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp", line=line@entry=514,
    function=function@entry=0xfffef21b65b8 "void llvm::RuntimeDyldELF::resolveAArch64Relocation(const llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t)") at ./assert/assert.c:92
#9  0x0000ffff97914040 in __GI___assert_fail (assertion=0xfffef21b6810 "isInt<33>(Result) && \"overflow check failed for relocation\"",
    file=0xfffef21b5a90 "/llvm-project/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp", line=514,
    function=0xfffef21b65b8 "void llvm::RuntimeDyldELF::resolveAArch64Relocation(const llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t)")
    at ./assert/assert.c:101
#10 0x0000fffeef42e440 in llvm::RuntimeDyldELF::resolveAArch64Relocation(llvm::SectionEntry const&, unsigned long, unsigned long, unsigned int, long) ()
   from /usr/local/lib/python3.10/dist-packages/cudaq/mlir/_mlir_libs/libCUDAQuantumPythonCAPI.so
#11 0x0000fffeef418148 in llvm::RuntimeDyldImpl::resolveRelocationList(llvm::SmallVector<llvm::RelocationEntry, 64u> const&, unsigned long) ()
   from /usr/local/lib/python3.10/dist-packages/cudaq/mlir/_mlir_libs/libCUDAQuantumPythonCAPI.so
#12 0x0000fffeef4182c8 in llvm::RuntimeDyldImpl::resolveLocalRelocations() ()
   from /usr/local/lib/python3.10/dist-packages/cudaq/mlir/_mlir_libs/libCUDAQuantumPythonCAPI.so
#13 0x0000fffeef41aa60 in ?? () from /usr/local/lib/python3.10/dist-packages/cudaq/mlir/_mlir_libs/libCUDAQuantumPythonCAPI.so
#14 0x0000fffeef41aebc in ?? () from /usr/local/lib/python3.10/dist-packages/cudaq/mlir/_mlir_libs/libCUDAQuantumPythonCAPI.so
#15 0x0000fffeef3cba28 in ?? () from /usr/local/lib/python3.10/dist-packages/cudaq/mlir/_mlir_libs/libCUDAQuantumPythonCAPI.so
#16 0x0000fffeef34e6b4 in ?? () from /usr/local/lib/python3.10/dist-packages/cudaq/mlir/_mlir_libs/libCUDAQuantumPythonCAPI.so
#17 0x0000fffeef344754 in ?? () from /usr/local/lib/python3.10/dist-packages/cudaq/mlir/_mlir_libs/libCUDAQuantumPythonCAPI.so
#18 0x0000fffeef3475a0 in ?? () from /usr/local/lib/python3.10/dist-packages/cudaq/mlir/_mlir_libs/libCUDAQuantumPythonCAPI.so

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant