Power management/Processor

From Gentoo Wiki
Jump to:navigation Jump to:search
Resources
This article has some todo items:
  • Add instructions for kernel configuration of remaining CPU frequency drivers
  • Describe AMD P-state EPP thoroughly

This article describes the setup of power management for processors.

CPU frequency scaling

CPU frequency scaling is a technique whereby the frequency (and voltage) of a processor can be automatically adjusted "on the fly" to conserve power. This helps improving the battery life of mobile devices, reduces the amount of heat generated by the chip which lessens the cooling requirements. The scaling can react to system load, be controlled by userspace tools, or react to ACPI events.

The ACPI specification describes the scaling mechanism as performance states - P-states or Processor Performance States.[1] The state labeled as P0 is used for the processor's highest possible frequency and P1-Pn states are used for lower frequencies.

Note
Lower processor frequency leads to lower number of instruction processed over a unit of time. This means finding a balance between frequency and performance is necessary.

The kernel CPUFreq subsystem[2] is responsible for handling the frequency scaling. This subsystem provides two basic means of changing the scaling behavior:

  • Scaling Governors - provide different approaches to estimate the desired processor frequency using different scaling algorithms.
  • Scaling Drivers - provide an interface between scaling governors and the specific hardware. Scaling driver can read/write hardware-specific values on behalf of the governor.


The CPUFreq subsystem exposes multiple sysfs interfaces. The most useful is created per-processor /sys/devices/system/cpu/cpu*/cpufreq/. This directory contains various files, like:

  • cpuinfo_cur_freq - current frequency in KHz as reported by the processor.
  • cpuinfo_min_freq - minimal possible frequency in KHz as reported by the processor.
  • cpuinfo_max_freq - maximal possible frequency in KHz as reported by the processor.
  • scaling_governor - currently used scaling governor. It can be changed by writing to this file.
  • scaling_driver - currently used scaling driver. It can be changed by writing to this file.
  • scaling_min_freq - minimal processor frequency in KHz to be used by the governor. It can be set by writing to this file.
  • scaling_max_freq - maximum processor frequency in KHz to be used by the governor. It can be set by writing to this file.

Installation

BIOS

Some functions can be enabled or disabled in the BIOS. Check that the following, if available, are enabled:

  • "Processor C1E support"
  • "Enhanced Intel SpeedStep (EIST)"
  • "AMD Cool'n'Quiet (C&Q)"
  • "AMD PowerNow!"

Kernel

Activate the following kernel options:

KERNEL Enabling CPU power management options (CONFIG_ACPI_PROCESSOR, CONFIG_CPU_FREQ_STAT)
Power management and ACPI options  --->
     [*] ACPI (Advanced Configuration and Power Interface) Support  --->
         <*>   Processor
     CPU Frequency scaling  --->
         -*- CPU Frequency scaling
             [*]   CPU frequency transition statistics
             Default CPUFreq governor (ondemand)  --->
                 Select a default governor; see below table
                 Default is 'ondemand'
             *** CPU frequency scaling drivers ***
                 Select a driver; see below table

Enabling CPUFreq governor and driver is needed:

Default CPUFreq governor
Option Module Supported Processors Note
'performance' governor cpufreq_performance Sets the frequency statically to the highest available processor frequency as defined by the file /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq. For recent Intel Core processors, this should be selected as default. [3] [4]
'powersave' governor cpufreq_powersave Sets the frequency statically to the lowest available processor frequency as defined by the file /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq. Can't be set as default.
'userspace' governor for userspace frequency scaling cpufreq_userspace To set the CPU frequency manually (via the file /sys/devices/system/cpu/cpu*/cpufreq/scaling_setspeed) or when a userspace program shall be able to set the processor frequency dynamically.
'ondemand' cpufreq policy governor cpufreq_ondemand Does a periodic polling and immediately changes frequency based on the processor load. For processors other than Intel Core, this should be selected as default.
'conservative' cpufreq governor cpufreq_conservative Similar to 'ondemand'. The frequency is gracefully increased and decreased rather than jumping to 100% when speed is required.
'schedutil' cpufreq policy governor cpufreq_schedutil Aimed at driving the frequency changes by the kernel scheduler.[5]
Tip
Name of the active CPUFreq governor is available in: /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

Behavior of active governor can be further configured via tunables exposed as sysfs interface. For more details see the dedicated documentation. Commonly used sysfs tunables include:

  • schedutil - /sys/devices/system/cpu/cpufreq/schedutil/rate_limit_us sets minimal interval in μs between consecutive governor runs.
  • ondemand - /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate sets the interval in μs between consecutive load sampling runs.
  • conservative - /sys/devices/system/cpu/cpufreq/conservative/freq_step sets the maximal frequency change step as % of scaling_max_freq.


CPU frequency scaling drivers
Option Module / Kernel symbol Supported Processors Note
Intel P state control intel_pstate (CONFIG_X86_INTEL_PSTATE) recent (Sandy Bridge+) Intel Core Implements an internal scaling governor. Shows itself as intel_cpufreq on Intel processors lacking Hardware P-States (HWP) (hwp CPU flag) support.[6]
AMD Processor P-State driver[7] amd-pstate (X86_AMD_PSTATE). AMD Zen 2 and newer Provides more fine grained frequency steps compared to the standard acpi-cpufreq driver.[7] Shows itself as amd_pstate_epp when its internal scaling governor implementation is active. Requires kernel v5.17 and above.
ACPI Processor P-States driver acpi-cpufreq (CONFIG_X86_ACPI_CPUFREQ) AMD Zen 1-based EPYC/Ryzen, older Intel Core (pre-Sandy Bridge)/Xeon, AMD Opteron/Phenom, Intel Atom, Intel Pentium M Acts as a generic CPUFreq driver. Utilizes ACPI Performance States. Note, for AMD processors it is limited to only 3 frequency steps unlike amd-pstate.[7]
AMD Opteron/Athlon64 PowerNow! powernow-k8 (CONFIG_X86_POWERNOW_K8) K8-based AMD Opteron, AMD Athlon 64, AMD Turion 64 Supports older AMD K8-based processors.
Intel Enhanced SpeedStep (deprecated) speedstep-centrino (CONFIG_X86_SPEEDSTEP_CENTRINO) Intel Pentium M (Centrino)/Xeon Deprecated, use ACPI Processor P-States driver instead.
Intel Pentium 4 clock modulation p4-clockmod (CONFIG_X86_P4_CLOCKMOD) Intel Pentium 4/Xeon Not recommended - causes severe slowdowns and noticeable latency.
Processor Clocking Control interface driver pcc-cpufreq (CONFIG_X86_PCC_CPUFREQ) x86 processors supporting the Processor Clocking Control (PCC) interface Adds support for the PCC interface. Might be useful for HP servers supporting the interface.[8]
Note
Availability of drivers depend on the processor architecture.
Tip
Name of the active CPUFreq driver is available in: /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver

Specific CPU scaling drivers settings

Intel P-state

This driver implements internal scaling governors (roughly similar to CPUFreq's powersave and performance) and works based on the processor load. It is intended for recent Intel Core series of processors (based on the Sandy Bridge microarchitecture or newer).

This driver works in either active mode (intel_pstate), for processors featuring Hardware P-States (HWP), or passive mode (intel_cpufreq). The passive mode concerns the processors not supporting HWP which are generations prior the Skylake microarchitecture - no hwp CPU flag is present.

In the active mode the processor autonomously sets the frequency based on provided CPUFreq parameters. This passes the control of frequency scaling to the processor itself. On the other hand, in the passive mode the driver behaves similarly to the generic acpi-cpufreq driver - it collaborates with the regular scaling governors. Although, it can use the full range of frequency steps.[9]

In the active mode case, the userspace, ondemand, and conservative scaling governors are unnecessary. The performance governor should be selected as the default. [10]

KERNEL Setup for Intel Sandy Bridge and newer Intel Core processors
Power management and ACPI options ---> 
  [*] CPU Frequency scaling --->
        Default CPUFreq governor (performance)  --->
    -*- 'performance' governor
    <*> Intel P state control

There is a sysfs interface exposed by the driver. Its root is located at the /sys/devices/system/cpu/intel_pstate/ directory. There are files like:

  • no_turbo - disables the Intel Turbo Boost feature (1 means disabled and 0 means enabled). The state can be changed by writing to this file.
  • status - displays the status of the driver. Values are either - off, passive, or active.


AMD P-State

This driver is available in kernel v5.17 or newer[11]. It aims to provide a more effective alternative to the generic acpi-cpufreq driver. It is based on Collaborative Processor Performance Control (CPPC)[12] to provide fine grained frequency steps. This was motivated by acpi-cpufreq providing only 3 frequency control options, and the lowest frequency is typically higher than what is made available when using amd-pstate thus being less effective than it might otherwise be as a way to maximize battery life.

It is intended for AMD Ryzen/EPYC processors based on the Zen 2 or newer microarchitecture. In case of hardware support and configuration mismatch the scaling driver gets set to the acpi-cpufreq as a fallback.

Tip
To verify the currently used driver did not fall back to acpi-cpufreq read: /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver.
Important
In order to use this driver, "CPPC", "ACPI CPPC", or similar BIOS setting must be set to enabled or auto.
KERNEL Kernel setup for amd-pstate (for Zen 2 or newer)
Power management and ACPI options --->
  [*] CPU Frequency scaling --->
         Default CPUFreq governor (performance)  --->
  -*-  'performance' governor
  [*]   AMD Processor P-State driver
  <M>   selftest for AMD Processor P-State driver

There is a sysfs interface exposed by the driver. Its root is located at the /sys/devices/system/cpu/amd_pstate/ directory. There are files like:

  • status - displays the status of the driver. Values are either - active, passive, guided, or disable.


When the currently used driver falls back to the acpi-cpufreq driver the following kernel command-line parameters can fix loading the amd-pstate driver:

  • Zen 2 processors: Add amd_pstate.shared_mem=1 to enable amd-pstate using its shared memory implementation.[13]
  • Zen 3 or newer processors: Add amd-pstate=passive. Zen 3 or newer also supports CPPC.[12]


Kernel 6.3 further developed available AMD P-State options in the form of Energy Preference Performance (EPP) modes.[14] This new driver is referred as amd_pstate_epp. It allows new combinations of drivers and governors such as "amd_pstate_epp powersave performance" or "amd_pstate_epp performance performance". Some benchmarks are available.

For further details on the AMD P-state driver see the documentation available upstream.

Manual governor/driver change

It is possible to change the active CPU governor and/or driver using a simple command:

root #echo ondemand | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

It is possible to execute this command on startup using means of the system's init system.

Set governor at boot time

It is possible to set the default governor via cpufreq.default_governor kernel command-line parameter.

Important
This parameter requires kernel v5.9+.[15]

Scheduling-Clock Ticks

The processor saves the most energy when the processor stays longer in its power savings mode, so it is desirable to reduce the amount of any actions that wakes the processor up. One of those actions can be scheduling-clock interrupts also known as "ticks". Details about the available "tickless" modes can be found in the kernel documentation.

Installation

BIOS

Some functions can be enabled or disabled in the BIOS. Check that the following settings are enabled:

  • "High Precision Event Timer"
  • "HPET"
  • "Multimedia timer"

Kernel

Activate the following kernel options for power saving features:

KERNEL Enabling tick optimizing functions in the kernel (CONFIG_NO_HZ_IDLE, CONFIG_HIGH_RES_TIMERS, CONFIG_HPET)
General setup  --->
   Timers subsystem  --->
      [*] Idle dynticks system (tickless idle)
      [*] High Resolution Timer Support
Device Drivers  --->
   Character devices  --->
      [*] HPET Timer Support

CPU Idle

Modern multi-core processors are often not fully loaded which brings an opportunity to suspend the unused parts and save power. The hardware transitions the unused parts to idle states. The kernel then does not schedule regular tasks to the idle parts but only special idle tasks.

The ACPI specification describes those idle states as C-states or Processor Power States. [16] There are usually multiple C-states implemented. Starting from the C0 state for a regularly running processor to C1, C2, and deeper idle states. The deeper the idle state, greater power saving but also a longer transition back to the running state.

The kernel CPUIdle subsystem[17] is responsible for handling the idle state management. Similarly to CPUFreq, this subsystem provides two basic means of idle state management - governor and driver. The governor attempts to predict the optimal C-state and driver to perform the operation on the hardware.

The CPUIdle subsystem exposes a sysfs interface. It is available at /sys/devices/system/cpu/cpuidle/. This directory contains various files, like:

  • current_governor - currently used idle governor. It can be changed by writing to this file.
  • available_governors - list of available idle governors.
  • current_driver - currently used idle driver information.

Installation

BIOS

Check that the following settings are enabled in BIOS:

  • "C-States"
  • "ACPI C states"

Kernel

CPU idle drivers
Name Module / Kernel symbol Supported Processors Note
Intel Idle Time Driver intel_idle (CONFIG_INTEL_IDLE) recent (Nehalem+) Intel Core[18] Asks the processor part to enter the idle state using the MWAIT instruction.
ACPI Idle Driver acpi_idle (CONFIG_ACPI_PROCESSOR_IDLE) AMD processors, old Intel processors Generic idle driver
CPU idle governors
Name Module / Kernel symbol Note
Ladder Governor ladder (CONFIG_CPU_IDLE_GOV_LADDER) Default governor for systems with allowed scheduler ticks in idle - CONFIG_NO_HZ_IDLE=n.
Menu Governor menu (CONFIG_CPU_IDLE_GOV_MENU) Default governor for tickless systems - CONFIG_NO_HZ_IDLE=y.
Timer events oriented (TEO) governor TEO (CONFIG_CPU_IDLE_GOV_TEO) Alternative governor for tickless systems - CONFIG_NO_HZ_IDLE=y.

Tools

PowerTOP

PowerTOP is a utility designed to measure, explain and minimize a computer's electrical power consumption.

When it is run, it sorts the running processes in order of how often they cause the processor to wake up. For details on installation, configuration and usage see the separate PowerTOP article.

cpupower

The sys-power/cpupower package provides a set of tools to comfortably manage and monitor processor powersaving features. The tools include cpupower frequency-info, cpupower frequency-set, and cpupower monitor.

hprofile

Allows automation some of the decisions of governing CPU frequency. For instance, when not wired to AC power, most users would like to have the system in a power saving mode.

This is where Hprofile comes into play. Please refer to its article for more information and configuration.

See also

External resources

References

  1. 8. Processor Configuration and Control — ACPI Specification 6.4 documentation, UEFI Forum, Inc. Retrieved 9 September 2023.
  2. CPU Performance Scaling, The kernel development community. Retrieved 9 September 2023.
  3. Dominik Brodowski. Intel P-State driver, CPU frequency and voltage scaling code in the Linux(TM) kernel. Retrieved 12 June 2016.
  4. Michael Larabel. Linux's "Ondemand" Governor Is No Longer Fit. Retrieved 15 October 2016.
  5. Improvements in CPU frequency management, LWN.net, Neil Brown, 6 April 2016. Retrieved 12 January 2022.
  6. intel_pstate CPU Performance Scaling Driver, kernel.org, Rafael J. Wysocki. Retrieved 12 January 2022.
  7. 7.0 7.1 7.2 amd-pstate CPU Performance Scaling Driver, The kernel development community. Retrieved 9 September 2023.
  8. Platform-based Power Management and Linux, Bdale Garbee and Naga Chumbalkar. Retrieved 9 September 2023.
  9. intel_pstate CPU Performance Scaling Driver, The kernel development community. Retrieved 9 September 2023.
  10. Dominik Brodowski. Intel P-State driver, CPU frequency and voltage scaling code in the Linux(TM) kernel. Retrieved 12 June 2016.
  11. AMD P-State Driver To Premiere In Linux 5.17 With Aim To Deliver Better Power Efficiency, Michael Larabel. Retrieved 9 September 2023.
  12. 12.0 12.1 Collaborative Processor Performance Control (CPPC), The kernel development community. Retrieved 9 September 2023.
  13. How to enable amd-pstate?, Manjaro.org. Retrieved 9 September 2023.
  14. Ryzen Mobile Power/Performance With Linux 6.3's New AMD P-State EPP Driver, Michael Larabel. Retrieved 9 September 2023.
  15. The kernel’s command-line parameters, The kernel development community. Retrieved 9 September 2023.
  16. 8.1. Processor Power States — ACPI Specification 6.4 documentation, UEFI Forum, Inc. Retrieved 10 September 2023.
  17. CPU Idle Time Management, The kernel development community. Retrieved 10 September 2023.
  18. intel_idle CPU Idle Time Management Driver, The kernel development community. Retrieved 10 September 2023.