\documentclass[12pt]{llncs}

\usepackage{indentfirst}
\markright{Power management in Linux}
\begin{document}

\title{Power management in Linux}
\author{Pavel Machek}

\institute{SuSE CR, s r.o., Drahobejlova 27, Praha 9, Czech Republic\\
\email{pavel@suse.cz}}

\maketitle              % typeset the title of the contribution

\begin{abstract}
As CPUs get more powerful (both in MIPS and in Watts), and notebooks
get smaller, power and thermal management gets more and more
important. Modern machines usually support several sleep states
(suspend to RAM, suspend to disk), few cpu halt states (hlt, C1, C2,
C3, C4, ...), cpu throttling, cpu frequency and voltage scaling, various disk
sleep states, and control over backlight. Modern operating system is
expected to keep processor cool and keep battery from
exploding. Linux is expected to deal with all this mess, and (around
2.6.13) does a good job in most areas, with exceptions of suspend
to RAM (and to smaller extent backlight control and vendor-specific
features). I will talk in bigger detail about 
swsusp (aka suspend to disk), because of its ``interesting'' design
which keeps confusing users and developers. I'll also present brief
overview of APM, ACPI and cpufreq modules.
\end{abstract}

\sloppy

\section{Overview}

Several things are necessary for operating system to smoothly run on
a notebook computer: slowing CPU down when idle, switching between
different CPU speeds and voltages, thermal management, spin the disk
down when it is not in use, suspend machine to RAM and to disk,
control LCD backlight, be able to read battery status and handle extra
keys notebooks like to introduce. 

Much of this stuff is beneficial to desktop and server PCs, too,
although it is not strictly necessary there. Desktop machines mostly
work to reduce noise and power consumption. Servers care most about
thermal output, battery life on UPS is also important.

Old method of doing power management on PC is called APM (Advanced
Power Management). Most things get done directly by hardware (or by
SMM BIOS, which is very similar to hardware from operating system perspective). Backlight
control actually tends to work, slowing CPU on idle is simplistic but
usually works. Suspend to disk and suspend to RAM is either not
present or done by hardware, and it usually has various
glitches; then there's a design problem with APM suspends: BIOS needs to
know about all the hardware in machine for this to work correctly, but
newer notebooks have replaceable mini-PCI (usually wifi) cards and
cardbus slots, so it can not really work. APM does not scale frequency/CPU voltage, or does so in
hardware in a very simplistic way. Battery status is only displayed in
percent left and time-to-empty.

There's newer method of PC power management, called ACPI. It is better
than APM, but I'm not sure if it is right way to do things. Kernel
interprets byte code from ACPI BIOS (yes, we have language interpreter
in kernel!). ACPI standard is very long, and pretty hard to read. ACPI
handles all above aspects; we still have problems with backlight
control (standard exists but not everyone uses it) and suspend to RAM.

Battery control in ACPI is more advanced, it provides low-level info
that enable operating system to calculate times and capacities itself
(battery design capacity, capacity at last charge, design and current
voltage, current power drain), but also allows measuring power
consumption (has some problems on many notebooks) and allows operating
system to determine battery age (but some manufacturers fake it).

ACPI is used as a kitchen sink by Intel; so it also provides support
for discovering legacy devices on mainboards, docking, leds,
lid/sleep/power buttons, wifi enable and similar tasks.

On non-PC machines, kernel usually knows all the hardware details, and
power management tends to just work. For most embedded system,
hardware manufacturers do their own Linux ports, which certainly helps
a lot.

\section{Suspend to disk}

swsusp (originally SoftWare SUSPend, now also SWap SUSPend) is kernel
subsystem that is able to suspend machine to disk even without BIOS
support. It started as 2.4 patch from Gabor Kuti, now I'm working on
2.6 version (with help of
Nigel Cunningham and Patrick Mochel, and many other hackekrs). swsusp was merged
to Linus' kernel in the middle of 2.5 series.

Actually Nigel also has second code base called suspend2, which
contains many additional features over swsusp. It is faster, supports
compressed image, provides nice progress bar, and can be aborted by
pressing ``Esc''. It is unfortunately also quite big, and needs some
cleanups. Nigel is working on merging it.

How swsusp works: To ease implementation and prevent user processes
from talking to network after they are suspended, suspend starts with stopping
all the user processes. Then it frees at least 50% of memory, so it
can do atomic copy, and suspends devices, so that no DMA is running
while doing snapshot. Then it can disable interrupts, and do atomic
copy. Before atomic copy can be written to disk, devices have to be
resumed. Image is then written to disk, devices are suspended once
again, and machine is powered down. Seeing devices transition three
times during suspend is strange to users, but it is okay (and in
2.6.13-mm, code was enhanced to only transition ``slow'' devices such
as disks once). During
resume, normal boot is done, up-to the point just before mounting the
file system. Then devices are suspended, memory is copied back
atomically with interrupts disabled, devices are resumed and machine
continues normally.

Drivers have to provide support for saving/restoring hardware state
for this to work. All the drivers should register their hardware into
device tree, and then fill in suspend and resume callbacks. Reference
implementation is given in Documentation/power/pci.txt.

swsusp core is pretty stable these days, but there are still problems
drivers. Usefull method of debugging them is booting with
init=/bin/bash with minimal config, then starting suspend manually. If
that fails (unlikely), problem is probably in swsusp core. Else
drivers should be loaded one-by-one to find which one breaks the
suspend. swsusp documentation is available in Doc*/power/swsusp.txt.

\section{suspend to RAM}

At the first sight, everyone assumes that suspend to RAM is easier
than suspend to disk. That is actually not the case, because drivers
need to work way better for suspend to RAM to work. In swsusp case,
devices are already initialised by normal boot. In suspend to RAM
case, devices are in some rather weird state. Problem is especially
bad video drivers, because those have ROMs that need to be ran... (And
video devices are the only one without ``real'' support in linux kernel).

Things are getting better because Intel is pushing hardware vendors to
allways initialize the video cards during resume from suspend to RAM,
so new systems should work without any problems. For older systems,
some tricks can be done, see Documentation/power/video.txt. (If you
manage to get unlisted notebook to work, be sure to send me an update).

There's easy-to-use pseudo-suspend to RAM mode called ACPI S1. In that
mode, machine is not running, but its state is preserved. It was very
common in early days when Windows could not do suspend to RAM
properly. Unfortunately it does not provide interesting power saving.

\section{Power management}

Modern computers eat lots of power... which is quite a problem for
notebooks. CPUs are especially power-hungry. To keep battery-life
reasonable, there are three different mechanisms: There are CPU sleep
states (ACPI Cx). When operating system is not doing any computation
(waiting for disk or for user), it can place CPU into one of CPU sleep
states. Deeper sleep (bigger x in Cx) means better power saving, but
also takes longer to enter and exit, potentially hurting
performance. Then there are CPU power states: modern CPUs can run at
various voltage/frequency combinations. Of course, lower frequency
means slower computation, but advantage in battery life is worth it,
and some batteries can not support high frequencies. And then there's
throttling; it is independent of previous two systems and it modifies
duty cycle of CPU. It is mostly useful in emergency overheat
conditions, because its energy savings are only linear.

Linux supports CPU sleep states. Unfortunately C3 (and lower power
states) can't be used
when there's DMA going on; that's all the time if you use USB. That
means that USB power managment is very important for modern notebooks
(otherwise USB mouse can increase power consumption by as much as 2W,
and that's 20%% on small notebook). Power states and throttling are
supported, too, using cpufreq framework.

Cpufreq allows so-called governors to control frequency scaling
policy. There are few of them available, ranging from lowest-possible
and highest-possible frequency governors, to userspace one (powersaved
controls frequency from userland) to ondemand kernel governors. 

Then there's backlight, hard drive and vendor-specific features. Effort on controlling backlight
with ACPI is ongoing; laptopmode is available in 2.6 kernels to make
spinning disk down useful. Hotkeys are supported by vendor-specific
drivers and there's ongoing effort to provide common infrastructure.

\section{Thermal management}

Modern machines allow operating system to control machine cooling up
to certain point; if machine overheats badly and operating system
fails to react, hardware must protect itself from damage. It even
works on some systems ;-). Original specification allows operating
system to control fans, but most manufacturers do not allow that these
days. Fans are often controlled by hardware, but software still can do
``passive cooling'' -- basically slowing things down. That is useful
to keep machine running even with failed CPU fans. 

ACPI BIOS provides table of temperature thresholds for each cooling
method (manufacturers often got these limits wrong in past, it seems
to be getting better, you can find the table in
/proc/acpi/thermal*/*/trip*, and it can be overriden by writing into
that file). Linux then watches temperatures and acts accordingly.

Even on some machines without proper ACPI BIOS support, thermal
information can be accessed usingby directly using i2c bus. This gets
more detailed info then ACPI, but highly is very hardware specific,
and does not play with ACPI very well.

\section{Conclusion}

Notebooks get more and more important these days, and include lots of
advanced stuff to conserve power. Linux still fails in some important
areas (suspend to RAM), but things are getting better and Linux is
already very usable on notebooks.

In future, more work needs to be done in area of runtime power
managment -- saving power even while user is using the system. That
includes powering down USB subsystem when mouse is plugged in but user
did not use it for specific ammount of time, powering down devices
that are seldom used (paralel port), and powering down devices that
are currently not use (such as powering down audio when song is not
being played).

\end{document}

