Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash (alignment fault) in nft_rhash_gc on 6.6.67-v8-16k+ #353

Open
ajorg opened this issue Jan 9, 2025 · 9 comments
Open

Crash (alignment fault) in nft_rhash_gc on 6.6.67-v8-16k+ #353

ajorg opened this issue Jan 9, 2025 · 9 comments

Comments

@ajorg
Copy link

ajorg commented Jan 9, 2025

I had both tailscale and crowdsec running on my rPi 5. I'm not sure which triggered the bug, but I suspect crowdsec, as it probably loads really a lot of rules. Shortly after boot, the system became unresponsive with the following traces.

Disabling crowdsec and tailscale works around the issue by not loading nftables modules.

[Dec28 21:29] Unable to handle kernel paging request at virtual address ffff8000807e814c
[  +0.007976] Mem abort info:
[  +0.002807]   ESR = 0x0000000096000021
[  +0.003764]   EC = 0x25: DABT (current EL), IL = 32 bits
[  +0.005335]   SET = 0, FnV = 0
[  +0.003059]   EA = 0, S1PTW = 0
[  +0.003159]   FSC = 0x21: alignment fault
[  +0.004024] Data abort info:
[  +0.002887]   ISV = 0, ISS = 0x00000021, ISS2 = 0x00000000
[  +0.005507]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  +0.005070]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  +0.005334] swapper pgtable: 16k pages, 47-bit VAs, pgdp=0000000001398000
[  +0.006817] [ffff8000807e814c] pgd=18000001ffff4003, p4d=18000001ffff4003, pud=18000001ffff4003, pmd=18000001ffef0003, pte=00680000807e8f07
[  +0.012586] Internal error: Oops: 0000000096000021 [#1] PREEMPT SMP
[  +0.006292] Modules linked in: xt_MASQUERADE xt_tcpudp xt_mark nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tun nf_tables nfnetlink algif_hash algif_skcipher af_alg bnep rtl2832_sdr videobuf2_vmalloc vc4 r820t snd_soc_hdmi_codec rtl2832 drm_display_helper i2c_mux brcmfmac_wcc binfmt_misc regmap_i2c spidev brcmfmac cec hci_uart drm_dma_helper btbcm drm_kms_helper brcmutil bluetooth rpivid_hevc(C) pisp_be cfg80211 snd_soc_core aes_ce_blk aes_ce_cipher v4l2_mem2mem ghash_ce videobuf2_dma_contig gf128mul sha2_ce videobuf2_memops ecdh_generic sha256_arm64 dvb_usb_rtl28xxu snd_compress videobuf2_v4l2 ecc sha1_ce rfkill dvb_usb_v2 snd_pcm_dmaengine dvb_core videodev ftdi_sio cp210x snd_pcm libaes usbserial snd_timer v3d raspberrypi_hwmon videobuf2_common i2c_brcmstb snd mc spi_bcm2835 gpio_keys gpu_sched rp1_pio drm_shmem_helper rp1_mailbox raspberrypi_gpiomem rp1 rp1_adc nvmem_rmem uio_pdrv_genirq uio drm i2c_dev drm_panel_orientation_quirks backlight dm_mod fuse ip_tables x_tables ipv6
[  +0.090026] CPU: 1 PID: 23 Comm: kworker/1:0 Tainted: G         C         6.6.67-v8-16k+ #1833
[  +0.008650] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
[  +0.005853] Workqueue: events_power_efficient nft_rhash_gc [nf_tables]
[  +0.006580] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  +0.006989] pc : nft_rhash_gc+0x1e0/0x2c8 [nf_tables]
[  +0.005086] lr : nft_rhash_gc+0x108/0x2c8 [nf_tables]
[  +0.005084] sp : ffffc00080d5bcf0
[  +0.003321] x29: ffffc00080d5bcf0 x28: ffff8000807e8140 x27: ffffd000856a9000
[  +0.007166] x26: ffff800006200400 x25: ffff800040b376f0 x24: ffff800007458840
[  +0.007165] x23: 000000000000062a x22: 0000000000000000 x21: ffff800040b2f778
[  +0.007165] x20: ffff80010071e000 x19: ffff800040b37778 x18: 0000000000000000
[  +0.007165] x17: 0000000000000000 x16: ffffd000846c02c0 x15: ffffd00084d62a90
[  +0.007164] x14: 000000000000001e x13: ffff800040b376f0 x12: ffffd0008598ce30
[  +0.007165] x11: ffff8000807e8140 x10: ffffc00080d5bd68 x9 : 0000000000000c1a
[  +0.007165] x8 : 0000000000000000 x7 : ffff800080e0ec80 x6 : 0000000000000000
[  +0.007164] x5 : 0000000000000040 x4 : ffff800080e0ec80 x3 : 0000000000000000
[  +0.007164] x2 : ffff8000807e814c x1 : 0000000000000004 x0 : ffff8000807e814c
[  +0.007165] Call trace:
[  +0.002449]  nft_rhash_gc+0x1e0/0x2c8 [nf_tables]
[  +0.004736]  process_one_work+0x148/0x388
[  +0.002820] Unable to handle kernel paging request at virtual address ffff800180525bcc
[  +0.001204]  worker_thread+0x338/0x450
[  +0.000003]  kthread+0x120/0x130
[  +0.000026] Mem abort info:
[  +0.007928]  ret_from_fork+0x10/0x20
[  +0.000006] Code: 54fff984 d503201f 91003380 d2800081 (f821301f) 
[  +0.000002] ---[ end trace 0000000000000000 ]---
[  +0.024219]   ESR = 0x0000000096000021
[  +0.003777]   EC = 0x25: DABT (current EL), IL = 32 bits
[  +0.005367]   SET = 0, FnV = 0
[  +0.003067]   EA = 0, S1PTW = 0
[  +0.003175]   FSC = 0x21: alignment fault
[  +0.004047] Data abort info:
[  +0.002920]   ISV = 0, ISS = 0x00000021, ISS2 = 0x00000000
[  +0.005538]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  +0.005100]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  +0.005365] swapper pgtable: 16k pages, 47-bit VAs, pgdp=0000000001398000
[  +0.006846] [ffff800180525bcc] pgd=18000001ffff4003, p4d=18000001ffff4003, pud=18000001ffff4003, pmd=18000001ffcf0003, pte=0068000180524f07
[  +0.012601] Internal error: Oops: 0000000096000021 [#2] PREEMPT SMP
[  +0.006291] Modules linked in: xt_MASQUERADE xt_tcpudp xt_mark nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tun nf_tables nfnetlink algif_hash algif_skcipher af_alg bnep rtl2832_sdr videobuf2_vmalloc vc4 r820t snd_soc_hdmi_codec rtl2832 drm_display_helper i2c_mux brcmfmac_wcc binfmt_misc regmap_i2c spidev brcmfmac cec hci_uart drm_dma_helper btbcm drm_kms_helper brcmutil bluetooth rpivid_hevc(C) pisp_be cfg80211 snd_soc_core aes_ce_blk aes_ce_cipher v4l2_mem2mem ghash_ce videobuf2_dma_contig gf128mul sha2_ce videobuf2_memops ecdh_generic sha256_arm64 dvb_usb_rtl28xxu snd_compress videobuf2_v4l2 ecc sha1_ce rfkill dvb_usb_v2 snd_pcm_dmaengine dvb_core videodev ftdi_sio cp210x snd_pcm libaes usbserial snd_timer v3d raspberrypi_hwmon videobuf2_common i2c_brcmstb snd mc spi_bcm2835 gpio_keys gpu_sched rp1_pio drm_shmem_helper rp1_mailbox raspberrypi_gpiomem rp1 rp1_adc nvmem_rmem uio_pdrv_genirq uio drm i2c_dev drm_panel_orientation_quirks backlight dm_mod fuse ip_tables x_tables ipv6
[  +0.090020] CPU: 0 PID: 96 Comm: kworker/0:5 Tainted: G      D  C         6.6.67-v8-16k+ #1833
[  +0.008649] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
[  +0.005853] Workqueue: events_power_efficient nft_rhash_gc [nf_tables]
[  +0.006578] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  +0.006989] pc : nft_rhash_gc+0x1e0/0x2c8 [nf_tables]
[  +0.005086] lr : nft_rhash_gc+0x108/0x2c8 [nf_tables]
[  +0.005084] sp : ffffc000822fbcf0
[  +0.003321] x29: ffffc000822fbcf0 x28: ffff800180525bc0 x27: ffffd000856a9000
[  +0.007166] x26: ffff800006cdda80 x25: ffff8000068416f0 x24: ffff800007458840
[  +0.007164] x23: 000000000000062a x22: 0000000000000000 x21: ffff800006839778
[  +0.007166] x20: ffff8000c0844000 x19: ffff800006841778 x18: 0000000000000000
[  +0.007164] x17: 0000000000000000 x16: ffffd000846c02c0 x15: 0000000000000000
[  +0.007165] x14: 0000000000000000 x13: ffff8000068416f0 x12: ffffd0008598ce30
[  +0.007165] x11: ffff800180525bc0 x10: ffffc000822fbd68 x9 : 00000000000000b9
[  +0.007165] x8 : 0000000000000000 x7 : ffff800180527100 x6 : 0000000000000000
[  +0.007165] x5 : 0000000000000040 x4 : ffff800180527100 x3 : 0000000000000000
[  +0.007164] x2 : ffff800180525bcc x1 : 0000000000000004 x0 : ffff800180525bcc
[  +0.007165] Call trace:
[  +0.002448]  nft_rhash_gc+0x1e0/0x2c8 [nf_tables]
[  +0.004736]  process_one_work+0x148/0x388
[  +0.004022]  worker_thread+0x338/0x450
[  +0.003759]  kthread+0x120/0x130
[  +0.003237]  ret_from_fork+0x10/0x20
[  +0.003586] Code: 54fff984 d503201f 91003380 d2800081 (f821301f) 
[  +0.006115] ---[ end trace 0000000000000000 ]---
@ajorg
Copy link
Author

ajorg commented Jan 9, 2025

I wondered if this might be an upstream bug, something that isn't working correctly with a 16k page size that worked fine with a 4k page size.

@lurch
Copy link
Collaborator

lurch commented Jan 9, 2025

ping @popcornmix - should this be transferred to the linux repo, or should it just be closed and added to the list in #107 ?

@popcornmix
Copy link

I wondered if this might be an upstream bug, something that isn't working correctly with a 16k page size that worked fine with a 4k page size.

Your comment wasn't quite clear.
Are you saying on a Pi5, by default you get the kernel panic, but if you use kernel=kernel8.img on the same Pi5 there is no problem?

@ajorg
Copy link
Author

ajorg commented Jan 9, 2025

I only know for certain that the crash started recently, sometime after installing the 6.6 16k kernel.

If there's a 6.6 4k build, I can try that and re-enable both services to compare. It was a bit tricky to catch things before the crash, to disable the services so that the system was usable again, but not impossible.

@lurch
Copy link
Collaborator

lurch commented Jan 9, 2025

If there's a 6.6 4k build

There is, that's what the kernel=kernel8.img (which you'd need to add to /boot/firmware/config.txt) in @popcornmix 's previous message is about 🙂 See https://www.raspberrypi.com/documentation/computers/config_txt.html#kernel for more information.

@popcornmix
Copy link

On your existing pi5 install that shows the issue. add kernel=kernel8.img to config.txt and reboot.
You should find the output of getconf PAGESIZE changes from 16384 to 4096.

Does that avoid the issue?

@ajorg
Copy link
Author

ajorg commented Jan 9, 2025

Thanks. I'll give it a try later today and report back today or tomorrow.

@ajorg
Copy link
Author

ajorg commented Jan 10, 2025

So far so good, but I didn't have the kernel8.img installed, so I ran rpi-update and now I'm on 6.6.69-v8+. It usually crashed very soon after startup. I'll leave it an hour or so, and then try to go back to the 16k kernel and see that it crashes again there.

@ajorg
Copy link
Author

ajorg commented Jan 10, 2025

Rebooted into the 16k kernel of the same version, with those services enabled so that nf_tables loads, and my Pi 5 became unresponsive again.

[ 3041.156796] Unable to handle kernel paging request at virtual address ffff80010084080c
[ 3041.164762] Mem abort info:
[ 3041.167564]   ESR = 0x0000000096000021
[ 3041.171325]   EC = 0x25: DABT (current EL), IL = 32 bits
[ 3041.176658]   SET = 0, FnV = 0
[ 3041.179718]   EA = 0, S1PTW = 0
[ 3041.182866]   FSC = 0x21: alignment fault
[ 3041.186888] Data abort info:
[ 3041.189774]   ISV = 0, ISS = 0x00000021, ISS2 = 0x00000000
[ 3041.195279]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 3041.200349]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 3041.205681] swapper pgtable: 16k pages, 47-bit VAs, pgdp=0000000001398000
[ 3041.212496] [ffff80010084080c] pgd=18000001ffff4003, p4d=18000001ffff4003, pud=18000001ffff4003, pmd=18000001ffdf0003, pte=0068000100840f07
[ 3041.225078] Internal error: Oops: 0000000096000021 [#1] PREEMPT SMP
[ 3041.231368] Modules linked in: xt_MASQUERADE xt_tcpudp xt_mark nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tun nf_tables nfnetlink algif_hash algif_skcipher af_alg bnep rtl2832_sdr videobuf2_vmalloc vc4 r820t rtl2832 i2c_mux brcmfmac_wcc binfmt_misc spidev regmap_i2c brcmfmac snd_soc_hdmi_codec hci_uart brcmutil btbcm drm_display_helper cec bluetooth cfg80211 drm_dma_helper aes_ce_blk drm_kms_helper aes_ce_cipher ghash_ce rpivid_hevc(C) gf128mul snd_soc_core dvb_usb_rtl28xxu sha2_ce pisp_be sha256_arm64 dvb_usb_v2 ecdh_generic sha1_ce ecc v4l2_mem2mem dvb_core snd_compress snd_pcm_dmaengine videobuf2_dma_contig rfkill videobuf2_memops cp210x ftdi_sio snd_pcm libaes videobuf2_v4l2 usbserial raspberrypi_hwmon videodev snd_timer snd videobuf2_common v3d i2c_brcmstb mc spi_bcm2835 gpu_sched rp1_pio drm_shmem_helper gpio_keys raspberrypi_gpiomem rp1_adc rp1_mailbox rp1 nvmem_rmem uio_pdrv_genirq uio i2c_dev drm fuse drm_panel_orientation_quirks backlight dm_mod ip_tables x_tables ipv6
[ 3041.321373] CPU: 1 PID: 1947 Comm: kworker/1:0 Tainted: G         C         6.6.69-v8-16k+ #1836
[ 3041.330193] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
[ 3041.336045] Workqueue: events_power_efficient nft_rhash_gc [nf_tables]
[ 3041.342614] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 3041.349601] pc : nft_rhash_gc+0x1e0/0x2c8 [nf_tables]
[ 3041.354679] lr : nft_rhash_gc+0x108/0x2c8 [nf_tables]
[ 3041.359755] sp : ffffc00081b03cf0
[ 3041.363074] x29: ffffc00081b03cf0 x28: ffff800100840800 x27: ffffd000856a9000
[ 3041.370237] x26: ffff800007224400 x25: ffff8001c0ac48f0 x24: ffff8001c05892c0
[ 3041.377399] x23: 000000000000035a x22: 0000000000000000 x21: ffff8001c0abc978
[ 3041.384561] x20: ffff800140c9f000 x19: ffff8001c0ac4978 x18: 0000000000000000
[ 3041.391723] x17: 0000000000000000 x16: ffffd000846c0aa0 x15: 000000400004bea8
[ 3041.398885] x14: 0000000000000000 x13: ffff8001c0ac48f0 x12: ffffd0008598ce30
[ 3041.406047] x11: ffff800100840800 x10: ffffc00081b03d68 x9 : 0000000000000001
[ 3041.413209] x8 : 0000000000000000 x7 : ffffd000858ddc10 x6 : 0000000000000000
[ 3041.420370] x5 : 0000000000000040 x4 : ffffd000858ddc10 x3 : 0000000000000000
[ 3041.427532] x2 : ffff80010084080c x1 : 0000000000000004 x0 : ffff80010084080c
[ 3041.434695] Call trace:
[ 3041.437142]  nft_rhash_gc+0x1e0/0x2c8 [nf_tables]
[ 3041.441870]  process_one_work+0x148/0x388
[ 3041.445891]  worker_thread+0x338/0x450
[ 3041.449648]  kthread+0x120/0x130
[ 3041.452883]  ret_from_fork+0x10/0x20
[ 3041.456469] Code: 54fff984 d503201f 91003380 d2800081 (f821301f) 
[ 3041.462582] ---[ end trace 0000000000000000 ]---

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants