Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating index causes Segmentation fault #53

Open
jcberentsen opened this issue Sep 26, 2019 · 24 comments
Open

Creating index causes Segmentation fault #53

jcberentsen opened this issue Sep 26, 2019 · 24 comments

Comments

@jcberentsen
Copy link

After installing with:
$ cabal new-install hw-json-simd, running the example causes a segmentation fault.

$ echo "{}" | pv -t -e -b -a | hw-json-simd create-index --method standard -i /dev/stdin --output-ib-file test.json.ib.idx --output-bp-file test.json.bp.idx

3.00 B 0:00:00 [53.3KiB/s]
Segmentation fault (core dumped

The same happens with --method standard (and larger json input than this ;)

This is an attempt at stripping down a problem I had with using
fromByteStringViaSimd from hw-json where it also segfaults. The same code using fromByteStringViaBlanking works fine.

The fromByteStringViaSimd problem was tested on two different x86 machines (one without bmi2 support)
Looking at the code, the fromByteStringViaSimd seems to attempt to safeguard on the CPU capabilities, maybe it is not working as intended?

I also tried to pass diverse flags (via stack.yaml) to enable sse42, bmi2 and avx2 for various packages in the hw- ecosystem, without luck :(

@newhoggy
Copy link
Member

newhoggy commented Sep 26, 2019

Is this a regression or is this the first time you're using this library?

Also, what OS and GHC version?

@newhoggy
Copy link
Member

newhoggy commented Sep 26, 2019

This is my setup:

$ sysctl -a | grep machdep.cpu.features
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 8.6.5
$ cabal --version
cabal-install version 2.4.0.0
compiled using version 2.4.0.1 of the Cabal library
$ uname -a
Darwin intlkymac.lan 18.7.0 Darwin Kernel Version 18.7.0: Tue Aug 20 16:57:14 PDT 2019; root:xnu-4903.271.2~2/RELEASE_X86_64 x86_64
$ echo "{}" | pv -t -e -b -a | hw-json-simd create-index --method standard -i /dev/stdin --output-ib-file test.json.ib.idx --output-bp-file test.json.bp.idx
3.00 B 0:00:00 [13.4KiB/s]

@newhoggy
Copy link
Member

I'm also interested to know if a recent change in the hw-prim is responsible.

If you could modify the constraint for hw-prim to 0.6.2.32 please and see if that makes a difference.

@jcberentsen
Copy link
Author

Is this a regression or is this the first time you're using this library?

Also, what OS and GHC version?

This is the first time trying the library. The OS is Ubuntu 16.04 and 18.04
On both the 16.04 and 18.04 the version used by stack lts-14.7 is ghc-8.6.5 (I had to pin a few extra-dependencies.)

The hw-json-simd command was installed with ghc-8.0.2 on the Ubuntu 18.04 box
cabal version: 2.4.1.0

$ cat /proc/cpuinfo | grep flags
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
...

I'll have a go at hw-prim-0.6.2.32

@jcberentsen
Copy link
Author

Pinning hw-prim-0.6.2.32 still has the problem (On the 18.04 box)

@jcberentsen
Copy link
Author

Actually I think stack lts-14.7 was already on hw-prim-0.6.2.32
https://www.stackage.org/lts-14.7/package/hw-prim-0.6.2.32

@jcberentsen
Copy link
Author

Pinning to hw-prim-0.6.2.33 also segfaults

@jcberentsen
Copy link
Author

New problem; with hw-prim-0.6.2.33 using fromByteStringViaBlanking also crashes:

Illegal instruction (core dumped)

I'll revert back to 0.6.2.32 and try this again

@jcberentsen
Copy link
Author

fromByteStringViaBlanking works fine again with hw-prim-0.6.2.32

@jcberentsen
Copy link
Author

jcberentsen commented Sep 27, 2019

On the other machine, which actually has bmi2, the fromByteStringViaBlanking version works fine with hw-prim-0.2.33.

Here are the flags I used in the stack.yaml:

flags:
  hw-json:
    bmi2: true
    sse42: true
  hw-json-simd:
    avx2: true
    bmi2: true
    sse42: true
  bits-extra:
      bmi2: true
  hw-rankselect-base:
    bmi2: true
  hw-rankselect:
    bmi2: true
  hw-simd:
    bmi2: true
    avx2: true

The bmi2: true is a lie on the 18.04 machine, so this probably explains the 'Illegal instruction'
Setting bmi2: false on that machine, resolves the illegal instruction problem.

The original problem still remains on the 16.04 machine, which has the following cpuinfo:

$ cat /proc/cpuinfo | head -n 28
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 79
model name      : Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
stepping        : 1
microcode       : 0xb00002e
cpu MHz         : 2194.711
cache size      : 56320 KB
physical id     : 0
siblings        : 22
core id         : 0
cpu cores       : 22
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 20
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 invpcid rtm rdseed adx smap xsaveopt arat flush_l1d arch_capabilities
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds
bogomips        : 4389.42
clflush size    : 64
cache_alignment : 64
address sizes   : 43 bits physical, 48 bits virtual
power management:

@jcberentsen
Copy link
Author

I am able to reproduce the problem in hw-json-simd HEAD, with this code added as hw-json-simd/test/Spec.hs:

{-# LANGUAGE OverloadedStrings #-}

import qualified HaskellWorks.Data.ByteString.Lazy          as LBS
import           HaskellWorks.Data.Json.Simd.Index.Standard

main :: IO ()
main = do
  -- let res = makeStandardJsonIbBps "{}"
  let res = makeStandardJsonIbBps . LBS.resegmentPadded 512 $ "{}"
  case res of
    Right chunks -> do
      putStrLn $ "Chunks:"
      let triggerBug = True
      if triggerBug then putStrLn $ show (length chunks) else pure ()

    err ->
        putStrLn $ "No chunks: " ++ show err
$ ./project.sh test
Build profile: -w ghc-8.0.2 -O2
In order, the following will be built (use -v for more details):
 - hw-json-simd-0.1.0.2 (test:hw-json-simd-test) (file test/Spec.hs changed)
Preprocessing test suite 'hw-json-simd-test' for hw-json-simd-0.1.0.2..
Building test suite 'hw-json-simd-test' for hw-json-simd-0.1.0.2..
[2 of 2] Compiling Main             ( test/Spec.hs, /home/chrberen/github/hw-json-simd/dist-newstyle/build/x86_64-linux/ghc-8.0.2/hw-json-simd-0.1.0.2/t/hw-json-simd-test/opt/build/hw-json-simd-test/hw-json-simd-test-tmp/Main.o )
Linking /home/chrberen/github/hw-json-simd/dist-newstyle/build/x86_64-linux/ghc-8.0.2/hw-json-simd-0.1.0.2/t/hw-json-simd-test/opt/build/hw-json-simd-test/hw-json-simd-test ...
Running 1 test suites...
Test suite hw-json-simd-test: RUNNING...
Test suite hw-json-simd-test: FAIL
Test suite logged to:
/home/chrberen/github/hw-json-simd/dist-newstyle/build/x86_64-linux/ghc-8.0.2/hw-json-simd-0.1.0.2/t/hw-json-simd-test/opt/test/hw-json-simd-0.1.0.2-hw-json-simd-test.log
0 of 1 test suites (0 of 1 test cases) passed.
cabal: Tests failed for test:hw-json-simd-test from hw-json-simd-0.1.0.2.

Disabling the computation of length chunks doesn't trigger, but I guess this is just because of lazy evaluation?

Not doing the resegmentPadded also fails

@newhoggy
Copy link
Member

The ergonomics of figuring out why something fails is not so good :(

@newhoggy
Copy link
Member

Is there an EC2 instance or where this happens?

@newhoggy
Copy link
Member

newhoggy commented Sep 27, 2019

Try this:

$ cd cbits
$ make
$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
$ ./a.out sm simple.json simple.json.ib.idx simple.json.bp.idx

@jcberentsen
Copy link
Author

Both invocations of a.out segfaults on the machines in question (They are not available in EC2)

Program received signal SIGSEGV, Segmentation fault.
0x00000000004012f0 in hw_json_simd_sm_make_ib_op_cl_chunks ()
(gdb) bt
#0  0x00000000004012f0 in hw_json_simd_sm_make_ib_op_cl_chunks ()
#1  0x0000000000401692 in hw_simd_json_sm_main ()
#2  0x0000000000400819 in main ()
(gdb)

@jcberentsen
Copy link
Author

The previous gdb backtrace was for the sm command

Backtrace for the sp command:

(gdb) r
Starting program: /home/chrberen/github/hw-json-simd/cbits/a.out sp simple.json simple.json.ib.idx simple.json.bp.idx

Program received signal SIGSEGV, Segmentation fault.
0x0000000000400b66 in hw_json_simd_summarise ()
(gdb) bt
#0  0x0000000000400b66 in hw_json_simd_summarise ()
#1  0x0000000000400cdf in hw_json_simd_process_chunk ()
#2  0x0000000000401064 in hw_json_simd_main_spliced ()
#3  0x00000000004007fc in main ()
(gdb)

@jcberentsen
Copy link
Author

Assembly, if this is of any use:

│0x400b66 <hw_json_simd_summarise+22>    vmovdqa (%rdi),%ymm0                                                                                                  │
   │0x400b6a <hw_json_simd_summarise+26>    vpcmpeqb 0xe8e(%rip),%ymm0,%ymm1        # 0x401a00                                                                    │
   │0x400b72 <hw_json_simd_summarise+34>    vpmovmskb %ymm1,%r12d                                                                                                 │
   │0x400b76 <hw_json_simd_summarise+38>    vpcmpeqb 0xea2(%rip),%ymm0,%ymm1        # 0x401a20                                                                    │
   │0x400b7e <hw_json_simd_summarise+46>    vpmovmskb %ymm1,%r10d                                                                                                 │
   │0x400b82 <hw_json_simd_summarise+50>    vpcmpeqb 0xeb6(%rip),%ymm0,%ymm1        # 0x401a40                                                                    │
   │0x400b8a <hw_json_simd_summarise+58>    or     %r12d,%r10d                                                                                                    │
   │0x400b8d <hw_json_simd_summarise+61>    mov    %r10d,(%rsi)                                                                                                   │
   │0x400b90 <hw_json_simd_summarise+64>    vpmovmskb %ymm1,%ebx                                                                                                  │
   │0x400b94 <hw_json_simd_summarise+68>    vpcmpeqb 0xec4(%rip),%ymm0,%ymm1        # 0x401a60                                                                    │
   │0x400b9c <hw_json_simd_summarise+76>    vpmovmskb %ymm1,%r11d```

@jcberentsen
Copy link
Author

I suspect this is a memory alignment problem.
According to https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/malloc.3.html; malloc returns aligned memory on macOS. I don't think this is necessarily the case on other platforms

@jcberentsen
Copy link
Author

jcberentsen commented Sep 30, 2019

Here is some evidence suggesting alignment of the buffer on the stack may be the problem:
I added a print of the buffer address in simd-spliced.c and ran a.out multiple times...

chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7ffe9a0c7040 of size 32768
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7ffcb9909b00 of size 32768
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7ffc9589e5a0 of size 32768
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7ffe28f06b40 of size 32768
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7ffe340096b0 of size 32768
Segmentation fault (core dumped)
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7fffb74b0e10 of size 32768
Segmentation fault (core dumped)
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7fff6a0e48c0 of size 32768
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7ffde0d89500 of size 32768
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7fff5e5f8d30 of size 32768
Segmentation fault (core dumped)
chrberen@rdlearn02:~/github/hw-json-simd/cbits$ ./a.out sp simple.json simple.json.ib.idx simple.json.bp.idx
Buffer is at address 0x7ffdf992fc60 of size 32768

There seems to be a pattern between the buffer address alignment and when it segfaults?!
Seems like it segfaults when the next to last hex digit in the address is odd

@jcberentsen
Copy link
Author

The sm case seems to need phi-buffer on a 32-byte boundary. #56 contains code that also fixes the sm segmentation fault for me.

@newhoggy
Copy link
Member

Thanks so much for your PR.

There's one remaining thing that worries me, and that is I don't have a means to regression test any future code changes given that this seems to be either compiler or architecture specific.

@jcberentsen
Copy link
Author

Just to clarify, the a.out segfaults were fixed by aligning the buffers, but
running the haskell test still fails. I guess there needs to be some way of aligning the buffers passed to the c-code?

@newhoggy
Copy link
Member

newhoggy commented Sep 30, 2019

Good point. I'm guessing that it's possible to do that by adding a wrapper around mallocForeignPtrBytes with similar logic and calling that instead.

@newhoggy
Copy link
Member

newhoggy commented Oct 1, 2019

hw-json-simd-0.1.0.3 has been published.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants