Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SR-IOV: mainlining? #33

Open
jgcodes2020 opened this issue Jun 27, 2022 · 112 comments
Open

SR-IOV: mainlining? #33

jgcodes2020 opened this issue Jun 27, 2022 · 112 comments

Comments

@jgcodes2020
Copy link

Since drivers for SR-IOV have already been completed in this fork, could these drivers be mainlined into the kernel?

@daiaji
Copy link

daiaji commented Jul 7, 2022

It's interesting. Has anyone tested whether the vgpu driver is available on windows?

@krispan-intel
Copy link

  1. SRIOV upstream task is on-going, ETA Q4'23
  2. Yes, we already test Windows VM and works well.

@daiaji
Copy link

daiaji commented Jul 7, 2022

Is the test environment using alder Lake's iGPU or arc dGPU? Is it a test using a normal desktop PC instead of a server? How much does vGPU performance lose in VM environment?

sys-oak pushed a commit that referenced this issue Sep 9, 2022
commit 23c2d49 upstream.

The kmemleak_*_phys() apis do not check the address for lowmem's min
boundary, while the caller may pass an address below lowmem, which will
trigger an oops:

  # echo scan > /sys/kernel/debug/kmemleak
  Unable to handle kernel paging request at virtual address ff5fffffffe00000
  Oops [#1]
  Modules linked in:
  CPU: 2 PID: 134 Comm: bash Not tainted 5.18.0-rc1-next-20220407 #33
  Hardware name: riscv-virtio,qemu (DT)
  epc : scan_block+0x74/0x15c
   ra : scan_block+0x72/0x15c
  epc : ffffffff801e5806 ra : ffffffff801e5804 sp : ff200000104abc30
   gp : ffffffff815cd4e8 tp : ff60000004cfa340 t0 : 0000000000000200
   t1 : 00aaaaaac23954cc t2 : 00000000000003ff s0 : ff200000104abc90
   s1 : ffffffff81b0ff28 a0 : 0000000000000000 a1 : ff5fffffffe01000
   a2 : ffffffff81b0ff28 a3 : 0000000000000002 a4 : 0000000000000001
   a5 : 0000000000000000 a6 : ff200000104abd7c a7 : 0000000000000005
   s2 : ff5fffffffe00ff9 s3 : ffffffff815cd998 s4 : ffffffff815d0e90
   s5 : ffffffff81b0ff28 s6 : 0000000000000020 s7 : ffffffff815d0eb0
   s8 : ffffffffffffffff s9 : ff5fffffffe00000 s10: ff5fffffffe01000
   s11: 0000000000000022 t3 : 00ffffffaa17db4c t4 : 000000000000000f
   t5 : 0000000000000001 t6 : 0000000000000000
  status: 0000000000000100 badaddr: ff5fffffffe00000 cause: 000000000000000d
    scan_gray_list+0x12e/0x1a6
    kmemleak_scan+0x2aa/0x57e
    kmemleak_write+0x32a/0x40c
    full_proxy_write+0x56/0x82
    vfs_write+0xa6/0x2a6
    ksys_write+0x6c/0xe2
    sys_write+0x22/0x2a
    ret_from_syscall+0x0/0x2

The callers may not quite know the actual address they pass(e.g. from
devicetree).  So the kmemleak_*_phys() apis should guarantee the address
they finally use is in lowmem range, so check the address for lowmem's
min boundary.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Patrick Wang <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
@strongtz
Copy link

  1. SRIOV upstream task is on-going, ETA Q4'23
  2. Yes, we already test Windows VM and works well.

Hi, I managed to create a DKMS module using the driver source here. But the GPU driver in my WIndows VM fails with Error 43. I wonder if you're using a modded driver?

My DKMS module: https://github.com/strongtz/i915-sriov-dkms
more information: https://www.reddit.com/r/VFIO/comments/xoeika/sriov_on_intel_uhd_graphics_770_xe_12th_gen

@ChristophSchmidpeter
Copy link

Is the test environment using alder Lake's iGPU or arc dGPU? Is it a test using a normal desktop PC instead of a server? How much does vGPU performance lose in VM environment?

Is there any update on whether SR-IOV will be "only" possible on iGPUs or also on Arc dGPUs too?
Is it correct that this feature will be available both on desktop machines and notebooks?

@alx696
Copy link

alx696 commented Oct 13, 2022

@krispan-intel Does Arc A770 support SR-IOV?

I noticed that the Flex 170 might support SR-IOV , but no card to verify. It would be exciting if customer graphics cards(Arc A770, A750) could also support SR-IOV. Many enthusiasts will buy Intel graphics cards because of this.

2023.04.18 Update :

I tested Arc 770 under Ubuntu 22.04 Server. It is support GPU Passthrough, but need install Intel Indirect Display Driver to support stream work(ex Sunshine). Not support SR-IOV!

# lspci -nn | grep "VGA"
00:02.0 VGA compatible controller [0300]: Intel Corporation AlderLake-S GT1 [8086:4680] (rev 0c)
03:00.0 VGA compatible controller [0300]: Intel Corporation Device [8086:56a0] (rev 08)

# lspci -vs 03:00.0
03:00.0 VGA compatible controller: Intel Corporation Device 56a0 (rev 08) (prog-if 00 [VGA controller])
	Subsystem: Intel Corporation Device 1020
	Flags: bus master, fast devsel, latency 0, IRQ 154, IOMMU group 16
	Memory at 81000000 (64-bit, non-prefetchable) [size=16M]
	Memory at 6000000000 (64-bit, prefetchable) [size=16G]
	Expansion ROM at 82000000 [disabled] [size=2M]
	Capabilities: [40] Vendor Specific Information: Len=0c <?>
	Capabilities: [70] Express Endpoint, MSI 00
	Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
	Capabilities: [d0] Power Management version 3
	Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
	Capabilities: [420] Physical Resizable BAR
	Capabilities: [400] Latency Tolerance Reporting
	Kernel driver in use: i915
	Kernel modules: i915

without Capabilities: [***] Single Root I/O Virtualization (SR-IOV)


2023.03.22 Update :

I tested Data Center GPU Flex 140 under Ubuntu 22.04 Server. Currently only available in passthrough mode. A single card has two addresses, so one flex 140 card support two vm. Relatively stable, no crashes during testing.

# lspci -nn | grep "Display"
d0:00.0 Display controller [0380]: Intel Corporation Device [8086:56c1] (rev 05)
d4:00.0 Display controller [0380]: Intel Corporation Device [8086:56c1] (rev 05)

344f093f-260a-4399-92ca-b890a7c2cd31

SR-IOV driver not ready now.

@basncy
Copy link

basncy commented Nov 24, 2022

Works on win10 guest, with some code modification personally. while it is better to wait for official stable release.
lhB1vu5

@ammgws
Copy link

ammgws commented Nov 24, 2022

with some code modification personally

What does this mean? Your XML seems different, do you mean you edited the XML?

Mine (as generated by virt-manager) - getting Code43 in the guest:
image

Yours:
image

@basncy
Copy link

basncy commented Nov 27, 2022

Unlike PCI passthrough, which bound the entire hardware resource to one specific VM, this fancy function enables you to share one GPU with multiple VMs, then you can see the GPU address shows 00:02.x, which is called VF(virtual function), rather than 00:02.0, the physical hardware on host. And it requires support from both the motherboard chip(i915 in my case) and GPU. Your SRIOV function is not powered up, so neither your hardware nor software driver does not compatible.

@truvatech
Copy link

@strongtz , question for you, does that modified driver only work on 6.0.2, as mentioned from your latest commit message, or would that work on 6.0.9?

Been trying to get it working for days but no luck.

Basically get to the point where you echo to the sys device, but no matter what it returns permission denied
I'm on Fedora though, if that's the issue.

Otherwise dkms status shows the sriov-dkms driver from your repo, and I know it is loaded as it throws me an error in dmesg, which isn't there with the in-tree i915 module.

[ 66.468702] snd_hda_codec_hdmi hdaudioC0D2: No i915 binding for Intel HDMI/DP codec [ 66.468883] hdaudio hdaudioC0D2: Unable to configure, disabling

Thanks in advance!

@strongtz
Copy link

@truvatech 6.0.9 works well on Arch Linux here, both in host and VM. VA-API video acceleration in VM works.
If you're using my modified dkms driver, you will see debug prints in dmesg like
i915 0000:0b:00.0: i915_virtualization_probe: entry

@darkbasic
Copy link

Did anybody manage to verify if Arc supports any kind of SR-IOV?

@jsommr
Copy link

jsommr commented Dec 26, 2022

Did anybody manage to verify if Arc supports any kind of SR-IOV?

SR-IOV is not supported according to https://www.intel.com/content/www/us/en/support/articles/000093216/graphics.html

:(

@darkbasic
Copy link

Too bad, it looks like nobody will buy Intel Arc after all :(

@daiaji
Copy link

daiaji commented Dec 29, 2022

Too bad, it looks like nobody will buy Intel Arc after all :(

Indeed, there is no advantage in performance compared to competing products. The SR-IOV support in the promotion can be said to be the only bright spot. Now even this advantage does not exist. I really can’t think of anyone who would buy such a meaningless product.

@darkbasic
Copy link

I really can’t think of anyone who would buy such a meaningless product.

There is one very small use case left: av1 decoding/encoding on a budget. The lowest end Arc GPUs could make sense if all you need is a decoding/encoding accelerator, at least until RDNA3 budget cards get released.

@daiaji
Copy link

daiaji commented Dec 29, 2022

There is one very small use case left: av1 decoding/encoding on a budget. The lowest end Arc GPUs could make sense if all you need is a decoding/encoding accelerator, at least until RDNA3 budget cards get released.

This is usually one of the few selling points in their commercial propaganda. The disadvantages are obvious, poor drivers, poor performance, and poor prices. It can be said that there are almost no advantages, but if SR-IOV is supported, then these disadvantages can be tolerated.

Even limiting the number of VFs in SR-IOV to one is much better than the current state.

The Arc dGPU is a better description of the fact of wasting sand than the i7-11700K.


I learned that the transistor size is the same for all Arc dGPU multimedia codecs, which makes Arc dGPU only valuable as the lowest-end A370.

@myownfriend
Copy link

SR-IOV is not supported according to https://www.intel.com/content/www/us/en/support/articles/000093216/graphics.html

That's a huge bummer. I was just about to get one in hopes that it would support SR-IOV in the future.

@susanthenerd
Copy link

That's quite sad.

sys-oak pushed a commit that referenced this issue Jan 18, 2023
[ Upstream commit bcd7026 ]

By keep sending L2CAP_CONF_REQ packets, chan->num_conf_rsp increases
multiple times and eventually it will wrap around the maximum number
(i.e., 255).
This patch prevents this by adding a boundary check with
L2CAP_MAX_CONF_RSP

Btmon log:
Bluetooth monitor ver 5.64
= Note: Linux version 6.1.0-rc2 (x86_64)                               0.264594
= Note: Bluetooth subsystem version 2.22                               0.264636
@ MGMT Open: btmon (privileged) version 1.22                  {0x0001} 0.272191
= New Index: 00:00:00:00:00:00 (Primary,Virtual,hci0)          [hci0] 13.877604
@ RAW Open: 9496 (privileged) version 2.22                   {0x0002} 13.890741
= Open Index: 00:00:00:00:00:00                                [hci0] 13.900426
(...)
> ACL Data RX: Handle 200 flags 0x00 dlen 1033             #32 [hci0] 14.273106
        invalid packet size (12 != 1033)
        08 00 01 00 02 01 04 00 01 10 ff ff              ............
> ACL Data RX: Handle 200 flags 0x00 dlen 1547             #33 [hci0] 14.273561
        invalid packet size (14 != 1547)
        0a 00 01 00 04 01 06 00 40 00 00 00 00 00        ........@.....
> ACL Data RX: Handle 200 flags 0x00 dlen 2061             #34 [hci0] 14.274390
        invalid packet size (16 != 2061)
        0c 00 01 00 04 01 08 00 40 00 00 00 00 00 00 04  ........@.......
> ACL Data RX: Handle 200 flags 0x00 dlen 2061             #35 [hci0] 14.274932
        invalid packet size (16 != 2061)
        0c 00 01 00 04 01 08 00 40 00 00 00 07 00 03 00  ........@.......
= bluetoothd: Bluetooth daemon 5.43                                   14.401828
> ACL Data RX: Handle 200 flags 0x00 dlen 1033             #36 [hci0] 14.275753
        invalid packet size (12 != 1033)
        08 00 01 00 04 01 04 00 40 00 00 00              ........@...

Signed-off-by: Sungwoo Kim <[email protected]>
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
sys-oak pushed a commit that referenced this issue Jan 31, 2023
[ Upstream commit bcd7026 ]

By keep sending L2CAP_CONF_REQ packets, chan->num_conf_rsp increases
multiple times and eventually it will wrap around the maximum number
(i.e., 255).
This patch prevents this by adding a boundary check with
L2CAP_MAX_CONF_RSP

Btmon log:
Bluetooth monitor ver 5.64
= Note: Linux version 6.1.0-rc2 (x86_64)                               0.264594
= Note: Bluetooth subsystem version 2.22                               0.264636
@ MGMT Open: btmon (privileged) version 1.22                  {0x0001} 0.272191
= New Index: 00:00:00:00:00:00 (Primary,Virtual,hci0)          [hci0] 13.877604
@ RAW Open: 9496 (privileged) version 2.22                   {0x0002} 13.890741
= Open Index: 00:00:00:00:00:00                                [hci0] 13.900426
(...)
> ACL Data RX: Handle 200 flags 0x00 dlen 1033             #32 [hci0] 14.273106
        invalid packet size (12 != 1033)
        08 00 01 00 02 01 04 00 01 10 ff ff              ............
> ACL Data RX: Handle 200 flags 0x00 dlen 1547             #33 [hci0] 14.273561
        invalid packet size (14 != 1547)
        0a 00 01 00 04 01 06 00 40 00 00 00 00 00        ........@.....
> ACL Data RX: Handle 200 flags 0x00 dlen 2061             #34 [hci0] 14.274390
        invalid packet size (16 != 2061)
        0c 00 01 00 04 01 08 00 40 00 00 00 00 00 00 04  ........@.......
> ACL Data RX: Handle 200 flags 0x00 dlen 2061             #35 [hci0] 14.274932
        invalid packet size (16 != 2061)
        0c 00 01 00 04 01 08 00 40 00 00 00 07 00 03 00  ........@.......
= bluetoothd: Bluetooth daemon 5.43                                   14.401828
> ACL Data RX: Handle 200 flags 0x00 dlen 1033             #36 [hci0] 14.275753
        invalid packet size (12 != 1033)
        08 00 01 00 04 01 04 00 40 00 00 00              ........@...

Signed-off-by: Sungwoo Kim <[email protected]>
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
@aviallon
Copy link

aviallon commented Feb 1, 2023

SR-IOV is not supported according to https://www.intel.com/content/www/us/en/support/articles/000093216/graphics.html

That's a huge bummer. I was just about to get one in hopes that it would support SR-IOV in the future.

Could the drivers be patched to enable unofficial support? Or does this really require firmware support?

@ChristophSchmidpeter
Copy link

SR-IOV is not supported according to https://www.intel.com/content/www/us/en/support/articles/000093216/graphics.html

That's a huge bummer. I was just about to get one in hopes that it would support SR-IOV in the future.

Could the drivers be patched to enable unofficial support? Or does this really require firmware support?

I would assume that it was missing there on a physical level altogether or was fused off.

@starquake
Copy link

Could it be that it would only work for integrated graphics because of connections?

@darkbasic
Copy link

I'd say just lack of willingness to sell any card.

@ChristophSchmidpeter
Copy link

I'd say just lack of willingness to sell any card.

Nailed it.

sys-oak pushed a commit that referenced this issue Feb 17, 2023
[ Upstream commit bcd7026 ]

By keep sending L2CAP_CONF_REQ packets, chan->num_conf_rsp increases
multiple times and eventually it will wrap around the maximum number
(i.e., 255).
This patch prevents this by adding a boundary check with
L2CAP_MAX_CONF_RSP

Btmon log:
Bluetooth monitor ver 5.64
= Note: Linux version 6.1.0-rc2 (x86_64)                               0.264594
= Note: Bluetooth subsystem version 2.22                               0.264636
@ MGMT Open: btmon (privileged) version 1.22                  {0x0001} 0.272191
= New Index: 00:00:00:00:00:00 (Primary,Virtual,hci0)          [hci0] 13.877604
@ RAW Open: 9496 (privileged) version 2.22                   {0x0002} 13.890741
= Open Index: 00:00:00:00:00:00                                [hci0] 13.900426
(...)
> ACL Data RX: Handle 200 flags 0x00 dlen 1033             #32 [hci0] 14.273106
        invalid packet size (12 != 1033)
        08 00 01 00 02 01 04 00 01 10 ff ff              ............
> ACL Data RX: Handle 200 flags 0x00 dlen 1547             #33 [hci0] 14.273561
        invalid packet size (14 != 1547)
        0a 00 01 00 04 01 06 00 40 00 00 00 00 00        ........@.....
> ACL Data RX: Handle 200 flags 0x00 dlen 2061             #34 [hci0] 14.274390
        invalid packet size (16 != 2061)
        0c 00 01 00 04 01 08 00 40 00 00 00 00 00 00 04  ........@.......
> ACL Data RX: Handle 200 flags 0x00 dlen 2061             #35 [hci0] 14.274932
        invalid packet size (16 != 2061)
        0c 00 01 00 04 01 08 00 40 00 00 00 07 00 03 00  ........@.......
= bluetoothd: Bluetooth daemon 5.43                                   14.401828
> ACL Data RX: Handle 200 flags 0x00 dlen 1033             #36 [hci0] 14.275753
        invalid packet size (12 != 1033)
        08 00 01 00 04 01 04 00 40 00 00 00              ........@...

Signed-off-by: Sungwoo Kim <[email protected]>
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
sys-oak pushed a commit that referenced this issue Mar 16, 2023
[ Upstream commit bcd7026 ]

By keep sending L2CAP_CONF_REQ packets, chan->num_conf_rsp increases
multiple times and eventually it will wrap around the maximum number
(i.e., 255).
This patch prevents this by adding a boundary check with
L2CAP_MAX_CONF_RSP

Btmon log:
Bluetooth monitor ver 5.64
= Note: Linux version 6.1.0-rc2 (x86_64)                               0.264594
= Note: Bluetooth subsystem version 2.22                               0.264636
@ MGMT Open: btmon (privileged) version 1.22                  {0x0001} 0.272191
= New Index: 00:00:00:00:00:00 (Primary,Virtual,hci0)          [hci0] 13.877604
@ RAW Open: 9496 (privileged) version 2.22                   {0x0002} 13.890741
= Open Index: 00:00:00:00:00:00                                [hci0] 13.900426
(...)
> ACL Data RX: Handle 200 flags 0x00 dlen 1033             #32 [hci0] 14.273106
        invalid packet size (12 != 1033)
        08 00 01 00 02 01 04 00 01 10 ff ff              ............
> ACL Data RX: Handle 200 flags 0x00 dlen 1547             #33 [hci0] 14.273561
        invalid packet size (14 != 1547)
        0a 00 01 00 04 01 06 00 40 00 00 00 00 00        ........@.....
> ACL Data RX: Handle 200 flags 0x00 dlen 2061             #34 [hci0] 14.274390
        invalid packet size (16 != 2061)
        0c 00 01 00 04 01 08 00 40 00 00 00 00 00 00 04  ........@.......
> ACL Data RX: Handle 200 flags 0x00 dlen 2061             #35 [hci0] 14.274932
        invalid packet size (16 != 2061)
        0c 00 01 00 04 01 08 00 40 00 00 00 07 00 03 00  ........@.......
= bluetoothd: Bluetooth daemon 5.43                                   14.401828
> ACL Data RX: Handle 200 flags 0x00 dlen 1033             #36 [hci0] 14.275753
        invalid packet size (12 != 1033)
        08 00 01 00 04 01 04 00 40 00 00 00              ........@...

Signed-off-by: Sungwoo Kim <[email protected]>
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
@Daniel15
Copy link

@Daniel15 Does the Intel XE LTS Kernel work with Proxmox and Intel Arc A770?

I'm not sure as I haven't tried it. It should work if you copy the configuration across from the Proxmox kernel (so that all the same options are enabled). What I'm not sure about is if Proxmox apply any extra patches to their kernel, which would need to be applied too.

@ionutnechita
Copy link

Hello!

The latest kernel tree from intel seems to work with SR-IOV on A770. It is based on the 6.11.0-rc2 kernel version, you might want to compile this manually.

You can find this kernel tree here: https://gitlab.freedesktop.org/drm/xe/kernel

Screenshot from 2024-08-27 16-40-11

This is what i used for my configs:

/etc/default/grub GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=on kvm.ignore_msrs=1 vfio_pci.ids=8086:56a0,8086:4f90 pcie_aspm=off"

/etc/modprobe.d/blacklist.conf

blacklist i915
blacklist xe
blacklist snd_hda_intel

Using the Xe driver is the best option to have SR-IOV currently.
i915 is not yet ready to be for SR-IOV.
For new platforms it is good to switch from i915 to Xe for improved performance. Changing the driver can be done quite easily.

@cRaZy-bisCuiT
Copy link

Thanks for the feedback. I might wait for Proxmox to pick up >= 6.12 once it's released and buy a A770 afterwards.

@darkbasic
Copy link

Using the Xe driver is the best option to have SR-IOV currently

Do you confirm that A770 supports SR-IOV with the Xe driver?

@paulzzh
Copy link

paulzzh commented Oct 2, 2024

https://gitlab.freedesktop.org/drm/xe/kernel/-/blob/63e0695597a044c96bf369e4d8ba031291449d95/drivers/gpu/drm/xe/xe_pci.c

Currently no gpus are marked as has_sriov = true
So the SR-IOV support of the new xe driver just outputs "not supported" and then exits.

https://chatgpt.com/share/66fd8bbd-1434-8007-8e38-0b79ed59fdef

[   14.379213] xe 0000:00:02.0: [drm] Support for SR-IOV is not available
[   14.379215] xe 0000:00:02.0: [drm] Found ALDERLAKE_S/RPL-S (device ID a780) display version 12.00 stepping D0
[   14.379216] xe 0000:00:02.0: [drm:xe_pci_probe [xe]] ALDERLAKE_S RPLS a780:0004 dgfx:0 gfx:Xe_LP (12.00) media:Xe_M (12.00) display:yes dma_m_s:39 tc:1 gscfi:0 cscfi:0
[   14.379278] xe 0000:00:02.0: [drm:xe_pci_probe [xe]] Stepping = (G:D0, M:D0, B:**)
[   14.379315] xe 0000:00:02.0: [drm:xe_pci_probe [xe]] SR-IOV support: no (mode: none)
[   14.379349] xe 0000:00:02.0: [drm:intel_pch_type [xe]] Found Alder Lake PCH

Tested on Ubuntu Server liveCD 24.10 beta (kernel 6.11.0-7)
I think the SR-IOV support is not ready yet.

 - TGL (up to 7 VFs)
 - ADL (up to 7 VFs)
 - MTL (up to 7 VFs)
 - ATSM (up to 31 VFs)
 - PVC (up to 63 VFs)

Also it's interesting raptorlake-s is just a subplatform of alderlake-s.
So if ADL is supported, RPL should also be supported.

static const struct xe_device_desc adl_s_desc = {
	.graphics = &graphics_xelp,
	.media = &media_xem,
	PLATFORM(ALDERLAKE_S),
	.has_display = true,
	.has_llc = true,
	.require_force_probe = true,
	.subplatforms = (const struct xe_subplatform_desc[]) {
		{ XE_SUBPLATFORM_ALDERLAKE_S_RPLS, "RPLS", adls_rpls_ids },
		{},
	},
};

static const struct pci_device_id pciidlist[] = {
	...
	XE_ADLS_IDS(INTEL_VGA_DEVICE, &adl_s_desc),
	...
	XE_RPLS_IDS(INTEL_VGA_DEVICE, &adl_s_desc),
	...
};

@johntdavis84
Copy link

johntdavis84 commented Oct 2, 2024 via email

@ionutnechita
Copy link

ionutnechita commented Oct 9, 2024

Using the Xe driver is the best option to have SR-IOV currently

Do you confirm that A770 supports SR-IOV with the Xe driver?

SR-IOV not exist at A770 video card because GPU firmware(vBIOS) have this flag set at off. SR-IOV capabilities not supported on this card.

03:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
        Subsystem: Intel Corporation DG2 [Arc A770]
        Flags: bus master, fast devsel, latency 0, IRQ 146, IOMMU group 14
        Memory at 50000000 (64-bit, non-prefetchable) [size=16M]
        Memory at 6000000000 (64-bit, prefetchable) [size=16G]
        Expansion ROM at 51000000 [disabled] [size=2M]
        Capabilities: [40] Vendor Specific Information: Len=0c <?>
        Capabilities: [70] Express Endpoint, MSI 00
        Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
        Capabilities: [d0] Power Management version 3
        Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [420] Physical Resizable BAR
        Capabilities: [400] Latency Tolerance Reporting
        Kernel driver in use: i915
        Kernel modules: i915, xe

@paulzzh
Copy link

paulzzh commented Oct 10, 2024

https://lore.kernel.org/dri-devel/Zwekwrak12c5SSgo@fedora/

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20241010&id=e3e9cbb749889f13afd114dedeeb37ec232617a1

- Fixes and new development around SRIOV (Michal)
- More SRIOV work (Michal)

Let's wait for kernel 6.13

@johntdavis84
Copy link

Let's wait for kernel 6.13

So, sometime around February 2025?

I know this is a technologically hard problem, and as an unannounced/alpha/skunkworks feature, we've really got no right to expect any sort of hard release date, but SR-IOV support on existing 11th/12th gen CPUs has been just around the corner since kernel 6.9 at least.

I've had great success with the StrongTZ DKMS SR-IOV driver, but using it involves high frequency updates/kernel rebuilds and complicates updating the OS itself (any time a new kernel is pushed, the DKMS driver has to be updated, and it might not be compatible (yet), so you'll have to wait, and if you're running Proxomx that applies to the hypervisor and all the guests).

(Aside: Anecdotally, 12th gen iGPUs seem to be the sweet spot for the SR-IOV DKMS driver. 13th gen works but seems a bit finicky, and 14th gen is a bit too new still for me to have a good feel for it yet.)

I'm really looking forward to this.

Kernel 6.12 feels really almost there, though, like we're just waiting for them to flip some switches to make certain CPUs actually show as having supported iGPUs.

Fingers crossed 6.13 gets over the line. :)

@darkbasic
Copy link

I've had great success with the StrongTZ DKMS SR-IOV driver, but using it involves high frequency updates/kernel rebuilds and complicates updating the OS itself

Not only that but it quickly adds up: I also need to maintain a working dkms for zfs by backporting compat patches from the development branch plus constantly rebasing bnx2x-2_5g dkms to be able to use my Broadcom at 2.5Gbps. You quickly start to hate DKMS at some point.

@Daniel15
Copy link

You quickly start to hate DKMS at some point.

@darkbasic at least it's better than the alternative - recompiling the whole kernel just to add or update one driver.

@cxcel
Copy link

cxcel commented Nov 3, 2024

@jonas5

How to contact you, I want to know more about a770 sriov, my email is [email protected]

@x0wllaar
Copy link

x0wllaar commented Nov 4, 2024

Fingers crossed 6.13 gets over the line. :)

There's no has_sriov = true in https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/gpu/drm/xe/xe_pci.c as of now :(

@paulzzh
Copy link

paulzzh commented Nov 5, 2024

@x0wllaar
Copy link

x0wllaar commented Nov 5, 2024

Guess I would be getting myself an Intel based laptop after all then.

@plantroon
Copy link

Guess I would be getting myself an Intel based laptop after all then.

I would wait ... I regret buying Intel laptop (1235u), it's like a joke of technology bordering e-waste.

@stereomato
Copy link

I'm happy with my intel alderlake (i5-12500h) laptop.

@JeffWDH
Copy link

JeffWDH commented Nov 6, 2024

Unless I'm misunderstanding, the GPU firmware itself will need to support SR-IOV. Having driver support alone isn't going to magically enable SR-IOV for all Intel GPUs?

@darkbasic
Copy link

Exactly, no SR-IOV for discrete cards.

@johntdavis84
Copy link

johntdavis84 commented Nov 6, 2024 via email

@cRaZy-bisCuiT
Copy link

@paulzzh That CI testing is only about intergrated GPUs, isn't it? Maybe it's just me, but I don't see any Arc A770 / dGPU support there.

@paulzzh
Copy link

paulzzh commented Nov 11, 2024

@paulzzh That CI testing is only about intergrated GPUs, isn't it? Maybe it's just me, but I don't see any Arc A770 / dGPU support there.

I don't think A770 supports SR-IOV :(

@faeizmahrus
Copy link

Did SR-IOV support get mainline'd in kernel 6.13 ?

@darkbasic
Copy link

Nope.

@johntdavis84
Copy link

johntdavis84 commented Dec 16, 2024

6.14 is on the horizon for Q1 next year. There are some mentions of SR-IOV in what I've read, but nothing one way or the other on whether they've enabled it for 12th- and 13th-gen iGPUs.

Has anyone seen any news on this?

Meanwhile, the SR-IOV DKMS i915 driver is still in active development and currently being reworked to be based on linux-intel-lts 6.6/linux branch, so at least it'll be around for a while while we wait.

EDIT: The i915 DKMS driver is now being rebased to the 6.12 LTS kernel.

sys-oak pushed a commit that referenced this issue Dec 20, 2024
[ Upstream commit e28acc9 ]

Accessing `mr_table->mfc_cache_list` is protected by an RCU lock. In the
following code flow, the RCU read lock is not held, causing the
following error when `RCU_PROVE` is not held. The same problem might
show up in the IPv6 code path.

	6.12.0-rc5-kbuilder-01145-gbac17284bdcb #33 Tainted: G            E    N
	-----------------------------
	net/ipv4/ipmr_base.c:313 RCU-list traversed in non-reader section!!

	rcu_scheduler_active = 2, debug_locks = 1
		   2 locks held by RetransmitAggre/3519:
		    #0: ffff88816188c6c0 (nlk_cb_mutex-ROUTE){+.+.}-{3:3}, at: __netlink_dump_start+0x8a/0x290
		    #1: ffffffff83fcf7a8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_dumpit+0x6b/0x90

	stack backtrace:
		    lockdep_rcu_suspicious
		    mr_table_dump
		    ipmr_rtm_dumproute
		    rtnl_dump_all
		    rtnl_dumpit
		    netlink_dump
		    __netlink_dump_start
		    rtnetlink_rcv_msg
		    netlink_rcv_skb
		    netlink_unicast
		    netlink_sendmsg

This is not a problem per see, since the RTNL lock is held here, so, it
is safe to iterate in the list without the RCU read lock, as suggested
by Eric.

To alleviate the concern, modify the code to use
list_for_each_entry_rcu() with the RTNL-held argument.

The annotation will raise an error only if RTNL or RCU read lock are
missing during iteration, signaling a legitimate problem, otherwise it
will avoid this false positive.

This will solve the IPv6 case as well, since ip6mr_rtm_dumproute() calls
this function as well.

Signed-off-by: Breno Leitao <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
@Sporesirius
Copy link

Sporesirius commented Dec 23, 2024

6.14 is on the horizon for Q1 next year. There are some mentions of SR-IOV in what I've read, but nothing one way or the other on whether they've enabled it for 12th- and 13th-gen iGPUs.

Has anyone seen any news on this?

Meanwhile, the SR-IOV DKMS i915 driver is still in active development and currently being reworked to be based on linux-intel-lts 6.6/linux branch, so at least it'll be around for a while while we wait.

Hi, do you know if this also include dGPU Battlemage support?

@johntdavis84
Copy link

6.14 is on the horizon for Q1 next year. There are some mentions of SR-IOV in what I've read, but nothing one way or the other on whether they've enabled it for 12th- and 13th-gen iGPUs.
Has anyone seen any news on this?
Meanwhile, the SR-IOV DKMS i915 driver is still in active development and currently being reworked to be based on linux-intel-lts 6.6/linux branch, so at least it'll be around for a while while we wait.

Hi, do you know if this also include dGPU Battlemage support?

Sadly, we don't really know anything for sure. This is a low priority, skunkworks project inside Intel. We'll have to wait and see what we get when it actually arrives. Which will … hopefully … be in 6.14. Keep in mind that we started hearing that SR-IOV for 12th/13th-gen iGPUs would drop as early as 6.9. They've been adding pieces of the plumbing since at least 6.9, but the whole thing isn't there yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests