Originally presented Sep 4, 2002
Updated by NJG Feb 21, 2003
| In order to view the mathematical notations correctly, check here before continuing. |
Other averages are taken over time i.e., time-dependent averages. A particular example of such a time-dependent average is the load average metric that appears in certain UNIX (and therefore Linux) commands. Have you ever wondered how those three little numbers are produced?
In this presentation, I shall start at the surface (the shell) and gradually submerge into the depths of the Linux kernel to find out how the Linux load average gets calculated.
Finally, I'll compare the load average with other averaging techniques used in performance analysis and capacity planning.
[pax:~]% uptime 9:40am up 9 days, 10:36, 4 users, load average: 0.02, 0.01, 0.00
And on Linux systems ...
[pax:~]% procinfo Linux 2.0.36 (root@pax) (gcc 2.7.2.3) #1 Wed Jul 25 21:40:16 EST 2001 [pax] Memory: Total Used Free Shared Buffers Cached Mem: 95564 90252 5312 31412 33104 26412 Swap: 68508 0 68508 Bootup: Sun Jul 21 15:21:15 2002 Load average: 0.15 0.03 0.01 2/58 8557 ...
Three numbers: 1-, 5-, and 15-, minute averages of .... ?
[pax:~]% man "load average" No manual entry for load averageTim O'Reilly and Crew, p.726
The load average tries to measure the number of active processes at any time. As a measure of CPU utilization, the load average is simplistic, poorly defined, but far from useless.
Adrian Cockcroft, p.229
The load average is the sum of the run queue length and the number of jobs currently running on the CPUs. In Solaris 2.0 and 2.2 the load average did not include the running jobs but this bug was fixed in Solaris 2.3.

What's high? ... Ideally, you'd like a load average under, say, 3, ... Ultimately, 'high' means high enough so that you don't need uptime to tell you that the system is overloaded.
... different systems will behave differently under the same load average. ... running a single cpu-bound background job .... can bring response to a crawl even though the load avg remains quite low.
Blair Zajac (ORCA Author)
If long term trends indicate increasing figures, more or faster CPUs will eventually be necessary unless load can be displaced. For ideal utilization of your CPU, the maximum value here should be equal to the number of CPUs in the box.
Some hedging because the load average is not your average kind of
average. It's a time-dependent average ... a damped
time-dependent average.
But you're a Linux expert and you knew this already. Right?
Let's find out ...

|
Random Samples A. load average: 6.85, 7.37, 7.83 B. load average: 8.50, 10.93, 8.61 C. load average: 37.34, 9.47, 3.30 is the load:
Sequential Samples 8:00am load average: 1.21 0.81 0.13 8:10am load average: 37.34 9.47 3.30 8:50am load average: 19.21 16.02 7.40 9:15am load average: 13.92 15.13 8.18 9:40am load average: 10.51 13.50 8.47 10:30am load average: 8.50 10.93 8.61 11:00am load average: 8.15 9.84 8.55 11:20am load average: 7.72 9.20 8.44 1:00pm load average: 6.85 7.37 7.83Imagine a sysadm running the uptime command at those wall-clock times. In which LA sample does maximum load occur?
Excluding the first LA sample at 8am, in which sample does least load occur?:
Visual Hints
The 3 dots correspond to the 3 numeric LA values. The y-axis shows the load values and the x-axis shows a range of time between 1 and 15 minutes. The left-most point represents the 1-minute load average, the middle point represents the 5-minute load average and the right-most the 15-minute load average. Here is an animation of the above sequence.
|
Perl script sampled load average every 5 minutes using uptime

(Resembles the charging/discharging of an RC circuit)
http://lxr.linux.no/source/kernel/...
unsigned long avenrun[3];
624
625 static inline void calc_load(unsigned long ticks)
626 {
627 unsigned long active_tasks; /* fixed-point */
628 static int count = LOAD_FREQ;
629
630 count -= ticks;
631 if (count < 0) {
632 count += LOAD_FREQ;
633 active_tasks = count_active_tasks();
634 CALC_LOAD(avenrun[0], EXP_1, active_tasks);
635 CALC_LOAD(avenrun[1], EXP_5, active_tasks);
636 CALC_LOAD(avenrun[2], EXP_15, active_tasks);
637 }
638 }
The sampling interval of LOAD_FREQ is once every 5 HZ. How often is that?
1 HZ = 100 ticks 5 HZ = 500 ticksTherefore:
1 tick = 10 milliseconds 500 ticks = 5000 milliseconds (or 5 seconds)
So 5 HZ means that CALC_LOAD is called every 5 seconds.
Don't confuse this period with the reporting periods {1-, 5-, 15-} minutes.
58 extern unsigned long avenrun[ ]; /* Load averages */ 59 60 #define FSHIFT 11 /* nr of bits of precision */ 61 #define FIXED_1 (1<<FSHIFT) /* 1.0 as fixed-point */ 62 #define LOAD_FREQ (5*HZ) /* 5 sec intervals */ 63 #define EXP_1 1884 /* 1/exp(5sec/1min) as fixed-point */ 64 #define EXP_5 2014 /* 1/exp(5sec/5min) */ 65 #define EXP_15 2037 /* 1/exp(5sec/15min) */ 66 67 #define CALC_LOAD(load,exp,n) \ 68 load *= exp; \ 69 load += n*(FIXED_1-exp); \ 70 load >>= FSHIFT;
There are two points of interest here:
|
Calculate magic numbers directly from the formula:
|
| ||||||||||||||||||||||||
63 #define EXP_1 1884 /* 1/exp(5sec/1min) */ 64 #define EXP_5 2014 /* 1/exp(5sec/5min) */ 65 #define EXP_15 2037 /* 1/exp(5sec/15min) */
If the sampling rate was decreased to 2 second intervals...
| ||||||||||||||||||||||||
67 #define CALC_LOAD(load,exp,n) \ 68 load *= exp; \ 69 load += n*(FIXED_1-exp); \It's the fixed-point arithmetic version of:
| (1) |
| (2) |

| (3) |

General form of smoothed data is:
| (4) |
| (5) |
Moving Average (MA) º Arithmetic average with lag-k (see shortly).
Load Average(LA) º Exponentially-damped MA (Exp-MA)
| ||||||||||||||||||||||||
where a = 1 - exp(-5/60R).

The time-averaged queue length: [(åQ(Dt) x Dt)/ T] ® Q

| |||||||||||||||
This is the kind of model I used in my previous LUV talk (July 11,2000) in which I analyzed the average performance metrics associated with a fair-share scheduler.
The same kind of averages are used in my performance analyzer tool called Pretty Damn Quick.
Published in: Performance Engineering: State of the Art and Current Trends, Springer Lecture Notes in Computer Science, 2001.
Download a copy from www.perfdynamics.com/papers.html



Week 20 was Y2K.

|
Here are the solutions to the quiz given earlier.
Time Series This is the original time series during the 300 minutes in which the samples were collected.
Load Averages A plot of the load averages over 300 minutes.
An Easier Way? Just reverse the time axis. As described in the Visual Hints section of the quiz, the 3 dots correspond to the 3 numeric LA values and the y-axis shows the load values. But here, the x-axis shows a range of time between -15 and 0 minutes. The left-most point now represents the 15-minute load average, the middle point represents the 5-minute load average and the right-most the 1-minute load average. This representation more closely represents the trend in time.
|

Guerrilla Capacity Tools

November 2003
Then ... Go forth and Kong-ka! 
1 Copyright © 2002 - 2003 Performance Dynamics Company. All Rights Reserved.
2 Thanks to Mirko Fluher for letting me use pax.apana.org.au