I've been experiencing periodic lockups on my Ryzen-based, Fedora Workstation. It's been a constant puzzle so I went looking for a resolution. Apparently, I'm not the only one with this problem: (Kernel Bug 196683).

As a workaround, I've opted to set the kernel boot parameter: rcu_nocbs=0-15. Some have reported success with this, others have had to disable c6 states directly in the BIOS. I am opting for the former, for now, and hoping for the best. If I continue to have issues, I will update this accordingly.

These notes were written to remind me what I did, but they may be of use to others.

  1. Confirm that CONFIG_RCU_NOCB_CPU is set and compiled into the kernel. This is required for the rcu_nocbs setting to work. Luckily, It is compiled into the stock Fedora 27 kernel (source: Kernel Bug 196683: Comment 87).

    $ fgrep CONFIG_RCU_NOCB_CPU /boot/config-$(uname -r)
    CONFIG_RCU_NOCB_CPU=y
    $ 
    

If you're using Ubuntu, you can check out Programster's Ubuntu 16.04 - Compile Custom Kernel For Ryzen for help on compiling a custom kernel or just disable c6 states.

  1. Add rcu_nocbs=0-15 to the boot parameters. This setting is for the Ryzen 1700X which has 16 threads. As stated in this comment, to determine the range for your setting, determine the thread count for your CPU (16 for the 1700X) and subtract one (15).

    $ sudo vi /etc/default/grub
    

    Add rcu_nocbs=0-15 to the list of GRUB_CMDLINE_LINUX options.

    GRUB_TERMINAL_OUTPUT="console"
    GRUB_CMDLINE_LINUX="rcu_nocbs=0-15"
    

    You'll probably have more than one option already listed in GRUB_CMDLINE_LINUX just add the setting in with the rest.

  2. Apply the changes to the boot config and reboot.

    $ sudo grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg
    $ sudo reboot
    

    This command is somewhat distro and build dependent. This works for Fedora 27 systems booting from UEFI. More info on this can be found in the Fedora 27 System Administrator’s Guide: Working with the GRUB 2 Boot Loader

If it was effective, you should see something similar in your boot logs via journalctl:

kernel: Hierarchical RCU implementation.
kernel:         RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=16.
kernel:         Tasks RCU enabled.
kernel: RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=16
kernel: NR_IRQS: 524544, nr_irqs: 1096, preallocated irqs: 16
kernel:         Offload RCU callbacks from CPUs: 0-15.

We'll see if this improves stability. I'm hoping it does. I was locking up (on average) at least once a day. The bug is still active and open, so for now the workaround appears to be the only option.