Before anything, here's a quick glossary of terms, because I go into a bit of depth here.
- UEFI - Unified Extensible Firmware Interface, the modern replacement for the BIOS, people still call it BIOS.
- ACPI - Advanced Configuration and Power Interface, allows operating systems to discover & manage hardware.
- S3 - Suspend-to-RAM, traditional "deep sleep" where the system is powered off, but power is supplied to the memory to keep state.
- s2idle / S0ix - Modern Standby, the newer fancier version of S3, which doesn't work properly, drains laptop batteries, and keeps fans running.
- AML - ACPI Machine Language, basically machine code used in ACPI tables. Operating systems run an AML interpreter to do things like disabling a USB port.
- SMI - System Management Interrupt, a hardware interrupt that halts / pauss the operating system and directs the CPU to run code from the motherboard firmware.
- MMIO - Memory Mapped I/O, a way for the CPU to interact with hardware devices by reading / writing to specific memory addresses.
- PCH - Platform Controller Hub, the motherboard's "chipset", handles communication between the CPU and peripherals (USB, SATA, etc).
- I2C - Inter-Integrated Circuit, a low speed serial bus used to connect things like fan controllers and temperature sensors.
- USCI - USB Type-C Connector System Software Interface, allows an operating system to manage power negotiation, data transfer, etc via the motherboard's Type-C controller.
In the beginning...
For the longest time, my PC has not been able to go into a full sleep. It can enter s2idle, but not the full deep S3 sleep I want. I can shut it down, it can hibernate, but it will. not. sleep. loginctl suspend will happily turn off my monitors, the sleep hooks for my LEDs will run, but then it hangs and will not come back without a hard reset. I'd mostly accepted this for a while and just chose to shut down, or disable my monitors (I even have Super + K bound to disable DPMS so my displays will turn off and come back on when I press a key / move the mouse), but it's been scratching at the back of my mind for long enough.
I had assumed for a long while that this was an NVIDIA bug, as there have been genuine sleep issues over the years that I've had trouble with, for example: this github isssue where suspend does not work when the nvidia PreserveVideoMemoryAllocations kernel parameter is enabled. People have described the exact issue I was having... but then NVIDIA released a driver update fixing the issue for everyone else, but I was still affected.
Completely separately from this, I also recently broke my Artix install on my laptop. Instead of fixing it like any sane person would, I decided that there wasn't enough for me to care about on the thing, wiped it, and replicated BTRFS snapshots over from my PC to the laptop. This meant they were running with the exact same configuration, similar hardware too. The laptop is a Razer Blade 16 (2023) with a 14th Gen i9 and a 4090. My PC has a 13900k and a 3090. And what would you know, my laptop sleeps absolutely fine. To me this confirmed that it was not an issue with NVIDIA. So I started looking deeper...
I ended up finding this Reddit comment, effectively complaining about how Gigabyte's ACPI tables are a broken mess and do not properly support S3. This doesn't sound quite right, because as far as I can remember, this motherboard sleeps fine in Windows. So unless Windows is using some sort of vendor-specific shim-fix to hack S3 together, I assumed the board must support legacy S3 to some extent. This post, however, combined with the fact that I was now reasonably sure the GPU had nothing to do with it, set my focus on the motherboard.
Chapter I: Down the Rabbit-Hole
Considering the Reddit post mentioned Gigabyte's ACPI tables, I thought that would be a good place to start. I'm reasonably familiar with ACPI so dumping, splitting, and decompiling them was not difficult.
(For the DSDT, it's worth feeding the SSDTs back into the decompile rather than treating DSDT.aml in isolation, iasl will go as far as to tell you this as warning output. Without the extra tables the namespace is more difficult to read and cross-table references become easy to misinterpret.)
Alongside this, I wanted to see how far pm_test would get me before an actual hang... All passing, right up to the ASPM handoff - this gives me a much better indication that this is a problem on Gigabyte's side.
The first question I wanted to ask off the back of that Reddit comment was simple, "is the board actually exporting S3, or is it somehow lying"?
FADT was clear and conclusive:
Hardware Reduced = 0Low Power S0 Idle = 0- PM1A control @ I/O port
0x1804 - Newer s0ix sleep control/status registers are unused
This is good old-fashioned, legacy ACPI sleep. So the Reddit comment is incorrect, they haven't shipped a half-finished S0ix state pretenting to be S3. They've shipped S3. Whether or not it works or not is a different story...
Looking a bit further, the DSDT says the same thing;
Name (_S3, Package (0x04)
{
0x05,
Zero,
Zero,
Zero
})The _S3 object tells us that S3 exists and that the SLP_TYP value for is is 0x05. At this point, we've at least got the "lying" portion out of the way. I already knew my Linux box could see S3 available from the options provided in mem_sleep and the boot log showing ACPI PM support for S0, S3, S4, and S5.
Chapter II: The Pool of Tears
Now that I know the board is actually advertising proper S3 support, the next step is to see how far Linux can get before anything at the vendor-level occurs. This (unfortunately) means reading through the decompiled AML for _PTS, RPTS, LPTS, SLPE, and SLPX and then walking down the suspend path from the top.
Root _PTS(3) looks standard, platform hooks, TPM, some board-specific methods, then LPTS. LPTS is where things get interesting:
Method (LPTS, 1, NotSerialized)
{
SLPX = One
SLPE = One
...
}These fields live in an operation region at PMBASE + 0x30 (which on this board is 0x1830). Just before Linux writes the final sleep value to PM1A, the firmware arms a sleep SMI. This is important, pm_test passing tells me the AML methods returned but it does not get to the SMI handoff, so we have no visibility past this. As my earlier tests had passed, most likely we're looking at something either at the SMI handoff or beyond.
To clarify the failure boundary a little, if we look at the timeline of events here: Linux selects deep (s3), evaluates _PTS(3), LPTS then arms the sleep SMI, Linux writes SLP_TYP = 0x05 and SLP_EN to 0x1804, and hands off to the board.
Chapter III: A Caucus-Race and a long Tale
There's a limited amount of information I can actually gather from AML past the handoff boundary. But what I can do is have a look at Gigabyte's UEFI firmware image file to see if I can find where the failure is most likely to occur.
My UEFI version is F15, so I downloaded Z790GAMINGXAX.F15 from Gigabyte's support website. It is a 32M image that needs to be split into firmware volumes, decompress any interesting sections, then run strings over the results.
The results of this produced some interesting names:
- SleepSmi
- GbtSleepSmi
- S3MemoryVariable
- SmmS3SaveState
These were interesting enough to warrant further investigation by carving the embedded PE images out of the decompressed volume. This sounds overly-technical; but in practice it is just a case of finding MZ / PE\0\0 headers, splitting those ranges into .efi files, and then loading them into a disassembler.
Once the binaries were separated it was easy to find the sleep modules.
I skipped over SleepSmi initially and had a look at GbtSleepSmi (Gigabyte Sleep SMI). It was more vendor specific and sounded like it had more scope than S3MemoryVariable and SmmS3SaveState.
Short note on SMM if you haven't heard of it, SMM (System Management Mode) is scary firmware-owned code that runs standalone outside of the operating system in response to an SMI (System Management Interrupt). When an SMI is received, the OS halts and yields to SMM so that code can execute. This sounds a lot like a backdoor because it is, and it's been used in the past to build SMM rootkits. There's a great blog post on it here.
Chapter IV: The Rabbit sends in a little Bill
Inside GbtSleepSmi, the init path sets up callbacks for the various sleep types. The module stores the small integer sleep-state constants 3, 4, and 5, then registers handlers through the same dispatch protocol. The callback address changes only for sleep type 3. S4 and S5 have a short and boring callback, but S3 is given its own handler.
So what special work is Gigabyte doing specifically for S3? Well, the S3 callback in GbtSleepSmi writes to several hard-coded MMIO addresses:
0xc00aa0100xc00aa0840xc00aa004
And then performs a timed sequence of transfers using raw address bytes 0x40 & 0x48. A lot of vendor firmware stores the 8-bit wire-format addresss byte, but Linux tooling and most documentation you read will refer to 7-bit I2C addresses. To convert between the two, you need to shift the values right by one, making the actual 7-bit I2C addresses 0x20 and 0x24.
Tracing the helper logic showed this path programs PCI function 00:15.2, which the ACPI namespace exposes as \_SB.PC00.I2C2 (0x8000aa00 decodes to bus 0, device 0x15, function 2, register 0 which when cross-referenced with the ACPI tables on the same machine expose 00:15.0 through 00:15.3 as I2C0 through I2C3, so 00:15.2 = I2C2).
The long and short of this is that Gigabyte's SMM code is configuring the PCH serial I/O controller and talking to hidden devices on I2C2 before entering S3.
One of the payloads in the sequence is 0x43725353, 53 53 72 43 in little-endian byte order. This seems like some sort of tag but I'm not 100% sure what device consumes it or what it's for.
Chapter V: Advice from a Caterpillar
So what lives behind the I2C transactions?
The first real answer came from the firmware setup strings. Once I had the decompressed volume and offsets from strings -el -td, this cluster kept showing up in the same area of the image:
TCSS Platform Setting
USBC connector manager selection
Select UCSI or UCMC device in ACPI support based on configuration
Enable UCSI Device
Enable UCMC Device
BIOS-TCSS handshake
USBC_GetUSBConnStatus timeout value in milliseconds
USBC_SxEntryWait timeout value in milliseconds
USBC_SxExitWait timeout value in millisecondsThis seems to refer to the Type-C subsystem, and has explicit mentions of Sx entry and exit!
Looking from the ACPI side pointed me in the same direction. The firmware exposes \_SB.UBTC with HID USBC000 and \_SB.RHPX with HID MSFT8000. During boot, the generic UCSI device was not actually present at runtime:
/sys/bus/acpi/devices/USBC000:00/status = 0This was worth digging into because if _STA returns zero, then the OS will pretend that the device does not exist. In this case, _STA looks to be gated by GNVS fields USTC and UCMS. There was no usci_acpi driver bound, no Type-C devices registered in /sys/class/typec, and no I2C clients sitting in i2c-8.
So, the S3 SMM handler is issuing transactions on I2C2, while this same USCI device is being hidden by the firmware. Windows may be using some vendor-specific shim to handle this properly, a different NVS layout, or something else entirely.
In the end...
I do not have the smoking gun I was after. I cannot name the devices at 0x20 & 0x24. I do not know the exact failure point. But what I do have is a good failure boundary, and a pretty good idea of where in the stack it is occurring.
The board advertises ACPI S3. Linux reaches the normal pre-suspend AML. LTPS arms a sleep SMI. Gigabyte's GbtSleepSmi module takes charge, and then the system hangs.
Either way, it's not my bug to fix. I opened a support case with Gigabyte to see if they'll do anything. I suspect they won't as most vendors treat Linux as a "fringe" operating system. Oh well.
If Gigabyte comes back with anything interesting, I'll drop an update.
Update 1.
Literally just as I finished writing this up, I asked AI to proofread it, and it came back with something quite interesting:
The 32-bit little-endian payload 0x43725353 translates in memory order to the ASCII bytes SSrC. "SSrC" is a standard four character code command used by Texas Instruments USB Type-C Power Delivery controllers. It stands for "Send Source Capabilities". According to the Texas Instruments hardware design guidelines for the TPS65987/8, the default 7-bit I2C slave addresses mapped to the chip's primary I2C interface are exactly 0x20 (Port 1) and 0x24 (Port 2), which matches what I'd found earlier.
So, most likely, when S3 sleep is triggered, the system drops into SMM. The GbtSleepSmi module then blindly attempts to write the SSrC command to the TI PD controllers over I2C2. Because the controller is either power-gated or uninitialized by the OS, the SMM code hangs indefinitely while waiting for a hardware status register flag or an I2C ACK that will never arrive. Since the OS is halted during SMM, the entire machine hard locks.