Intuitively, the load average is an average over time of the number of
processes in the run queue.
uptime reports load averages
over 1-, 5- and 15-minute intervals. Typically, load averages are
divided by the number of CPU cores to find the load per CPU. Load averages
above 1 per CPU indicate that the CPUs are fully utilized. Depending on the
type of load and the I/O requirements, user-visible performance may
not be affected until levels of 2 per CPU are reached.
A general rule of thumb is that load averages that are persistently above
4 times the number of CPUs will result in sluggish performance.
Prior to Solaris 10, the calculation algorithm directly computed
the load average by periodically sampling the length of the run
queue. Since this measurement can be skewed by threads that enter
and exit more quickly than the sampling interval, Solaris 10 altered
the algorithm to use microstate accounting instead.
Solaris 10 applies an exponential decay algorithm to a combination of
high-resolution usr, sys and thread wait times. The numbers
are comparable to a traditional load average.
The load averages can be monitored intermittently via
uptime or over extended time periods by looking at run
queue lengths and the amount of time that the run queue is
One issue to watch for is the number of processes that are blocked while
waiting for I/O. Check
the disk I/O page
for information on monitoring this.
Solaris 10 allows us to directly monitor the amount of time threads
wait for a processor via the
prstat -mL command in the
For non-NFS servers, another danger sign is when the system consistently
spends more time in sys than usr mode. (nfsd operates in the kernel
in sys mode.)
MacDougall and Mauro comment that a typical usr/sys
ratio is in the neighborhood of 70/30 on a reasonably loaded system.
Another issue to watch for is a high number of system calls per
second per processor. With today's faster CPUs, 20,000 would
represent a reasonable threshold. This can be monitored via
In particular, the
large numbers of
execs may represent
excessive context switching. (Slower processors will be able to
handle fewer system calls per second.) Context switching is