Me this past weekend trying to setup GPU passthrough to a VM. Bought an AMD card just to passthrough my existing Nvidia one and have had nothing but issues with multiple distros 😔
That might be a questionable choice given that this would leave the nvidia driver running on host machine and it's usually the most fucky part of this whole operation.
Passthrough not actually working, VM not detecting the GPU or not loading qemu properly even with everything loaded properly. Tried on 3 different distros (Ubuntu and arch based) and none worked. Might try the other suggestion to swap the cards. Just means I'll have to redo my water loop for the 2nd time this week 🙃
This is, indeed, uncommon. Typically the GPU either gets detected(abeit, often with errors), or the VM doesn't start at all. Do you use libvirt by and chance?
But are you launching VM via virsh/virt-manager or directly using qemu-system-x86_64? Could you provide the XML or the command line you're using? What does lspci -k say in regards to your GPU's?
Do you want the overview XML or for a specific category within virt-manager?
A full XML, unless you have something private in there, which you can remove. I just remember that for nvidia's there could be parts preventing load anywhere. In my case, for example, it was booting a BIOS VM instead of UEFI one.
shows both my GPUs are there now
But what's the driver used? Should be something like this (my laptop for example, without irrelevant lines)
01:00.0 VGA compatible controller: NVIDIA Corporation GA104 [Geforce RTX 3070 Ti Laptop GPU] (rev a1)
Kernel driver in use: vfio-pci
01:00.1 Audio device: NVIDIA Corporation GA104 High Definition Audio Controller (rev a1)
Kernel driver in use: vfio-pci
06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt (rev c7)
Kernel driver in use: amdgpu
06:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device 1640
Kernel driver in use: snd_hda_intel
the BM says iommu group is not viable
Well that's something. Check the script at arch wiki on VFIO, at the paragraph "2.2 Ensuring that the groups are valid". It should print out the IOMMU groups you have in your system.
Basically, a thing with IOMMU is that you must pass all or none of the devices down to VM within each IOMMU group, even if you don't necessarily want them in your VM. In most cases, that means also passing the built-in sound card that feeds audio via HDMI outputs (the .1's in the above example). In cases where there's something else crucial in that IOMMU group, there's ACS patch but that's a hack and should only be used as a last resort.
Man I've followed like 6 different guides to a T and tried with 4 different distros and still can't get it to work. I'm done fighting with it for a while. Maybe it's just an issue with 7000 series AMD cards or the guides aren't up to date with the kernals idk. I need to take a long break from it before I get upset and just return the GPU lol
Good to know, I was just thinking of doing this exact thing. I haven't pulled the trigger on the AMD card though. I wanted it for wayland, but I still want to do CUDA things with my Nvidia card.