2025-10-31: GPU Instability Verdict
After a long 2 days of investigation & fix attempts… it was bios.
What happened?
A couple days prior to flying to Michigan, I ssh-ed onto my PC and attempted a full system update, which updated packages from dec 2024 to oct 2025 o_O. Little did I know, it caused my PC to freeze. I think this was the beginning of the pain.
The first day I got back to Michigan, I attempted to boot my PC but didn’t post due to VGA related issues. In order to fix it, I cleared the CMOS.
I didn’t realize my GPU was acting up until I started a game of Dark Hours. It would crash 5min
in. No need to do anything. Just load the game, wait 5min, and crash. Sometimes it would halt the
entire system.
I’m still not sure if clearing CMOS causes reverts to a factory BIOS firmware version or not. The
BIOS version was F2, which is from Aug 14, 2024. See Gigabyte’s X870-AORUS-ELITE-WIFI7 BIOS
for more info.
Potential root causes:
- Clearing CMOS reverted BIOS version to an unstable version, relative to RTX GPUs
- Some updated package, before clearing CMOS and PC freezing, mutated EFI causing BIOS instability
From gathered data, instability was caused due to data stream corruption, which could be caused by
high throughput on the PCIe of which required a new BIOS firmware to support. See saved crash logs
in 2025_10_30. From looking thorugh the Gigabyte’s X870-AORUS-ELITE-WIFI7
BIOS firmware changelogs, the F4 version backs the PCIe instability theory (change 3 & 4):
1. Checksum : 17DA
2. Update AMD AGESA 1.2.0.3a PatchA. Please also update AMD Chipset Driver to 7.01.08.129 or later
version to improve gaming performance for 2CCD Ryzen 7000 & 9000 CPUs
3. Optimized memory compatibility
4. Enhanced PCIe compatibility
5. Fix AMD CPU microcode signature verification vulnerability (CVE-2024-36347) for Ryzen 8000, 7000
series CPU
BIOS Update
From Gigabyte’s X870-AORUS-ELITE-WIFI7 BIOS, I updated my bios to the latest version F8e,
released on Sep 18, 2025. This includes ALL stability updates.
Big lesson learned here: When DRAM fails to be trained, POST status 1d, just hit the reset hw
button until it passes it. It takes 2-3 times.
All that’s required is to have a working flash drive, with a fat32 fs partition. Just mkfs.fat -F32 [path]. Then chuck the one file within the zip onto the drive, plug in, enter bios, and flash
using Q-Flash.
After updating, I noticed my linux UEFI boot option disappeared. See below section for fix, but It’s likey due to BIOS update clearing secure boot keys.
Windows adventure
This OS is such trash man. Getting this to boot was a nightmare. I attempted to dual boot, cause I assumed the issue was with nvidia drivers and linux. If this OS just WORKED, it could’ve saved me so much time, also assuming GPU was not stable on windows either.
Attempting to install windows via qemu, by passing through the intended drive and installing windows, succeeded at first. Within QEMU I suppose Windows didn’t see the drive as an external drive, prob cause the device block was layered through linux. I had to disable secure boot checks during the installation (just googled it. through regedit).
After installation was complete, exited qemu and updated grub with OS Prober enabled to include windows as part of the options. I also signed all windows boot files within my linux boot partition. Having not known that this isn’t how it works, as indicated within ArchWiki: UEFI Dual Boot:
It is usually not possible to boot Windows by signing its boot loader (EFI/Microsoft/Boot/bootmgfw.efi) with a custom, personal key with Secure Boot Mode enabled, without enrolling the “Microsoft Windows Production PCA 2011” key in the UEFI Secure Boot variables
sigh… I learned this while fixing linux boot partition from appearing in the BIOS boot options.
Anyway, when booting into Windows, all I kept getting was “There was a problem. We’re restarting your PC for you”. Like TELL ME WHAT’S WRONG BRO! I only suspect it was caused by secure boot cause my linux partition stopped showing up until I re-enrolled custom secure keys. This damn OS is such shit. What a waste of time. I needed to vent this out in writing, as this caused me to reinstall windows 3 times:
- using etcher kept failing due secure boot reasons
- using Github: Ventoy (which is an amazing idea!), which installed secure keys and windows but same problem when booting into windows
- using another Windows PC to OFFICIALLY setup a windows flash drive
All 3 ways caused the same problem. I’m suspecting it’s due to secure boot keys but I’m too cooked from this whole investigation to try. Documentation here is sufficient for future me’s need.
Fixing Linux boot partition post-bios update
So after updating BIOS, my linux boot parition stopped showing up. But windows still showed up. Having my windows install be on a SATA drive and linux on an M.2 SSD, I thought it was a PCIe/M.2 related update. Spent 2hrs messing about with BIOS settings, until I figured let’s just redo the boot partition.
Redoing the boot partition required me to redo secure boot, of which required me to put BIOS secure boot in “Setup” mode, of which required me to “clear secure keys”. THIS is the smoking gun that made me think Windows shitting itself was secure boot keys related. That and reading through the ArchWiki: Unified_Extensible_Firmware_Interface/Secure_Boot#Dual_booting_with_other_operating_systems.
After redoing all that, Linux boot option appeared and I booted in and sprinted to test the stability of my GPU!