Understanding load averages

Load averages are at once hugely simple and hideously complex. Understading what these numbers mean is important for correctly applying this simple indicator of system health.

First, a load average is not CPU percentage. That is simply a snapshot of how often a process was found being executed on the CPU. The load average differs in that it includes all demand for CPU, not just what is currently running.

A useful analogy

A four-processor machine can be visualized as a four-lane freeway. Each lane provides the path on which instructions can execute. A vehicle can represent those instructions. Additionally, there are vehicles on the entrance lanes ready to travel down the freeway, and the four lanes either are ready to accommodate that demand or they’re not. If all freeway lanes are jammed, the cars entering have to wait for an opening. If we now apply the CPU percentage and CPU load-average measurements to this situation, percentage examines the relative amount of time each vehicle was found occupying a freeway lane, which inherently ignores the pent-up demand for the freeway – that is, the cars lined up on the entrances.

The load average gives us that view because it includes the cars that are queuing up to get on the freeway. It could be the case that it is a nonrush-hour time of day, and there is little demand for the freeway, but there just happens to be a lot of cars on the road. The CPU percentage shows us how much the cars are using the freeway, but the load averages show us the whole picture, including pent-up demand. Ray Walker in Examining load averages

So, perfect utilization of a single CPU gives us a load average of 1.00, while anything above that represents unmet demand and anything below that represents unused supply. For a two-core machine, the perfect load average is 2.00 (1.00 for each of two cores). This is unfortunate, since one needs to know how many cores are available to make sense of the values. (Technically, a load average of 0 is perfect utilization, but that isn’t typically possible outside embedded systems).

uptime and other tools like w provde three load averages: 1-, 5-, and 15-minute averages. They’re actually exponentially-dampened moving averages, since recent load is more likely to affect current system performance than old load. These numbers are actually presented in the wrong order to intuit a trend. For the mathematics behind calculating the load averages, read UNIX Load Average Part 1: How It Works.

Conclusion

Load averages show how much demand for CPU there is (run queue length), not simply how much use there is. Consequently, load averages provide a more sophisticated measure of system utilization than CPU percentage. However, one must know the number of CPUs to understand the load average.

Further reading