kmastat
Kernel memory size can be tracked using the
sar -k
command. The total of the "alloc" fields is the kernel memory size.
If it appears to be growing without bound, you may have
a memory leak. It should be noted that not all buckets are
tracked by sar -k, so the reported memory size
is not as accurate as that reported by crash.
On occasion there are problems related to memory leaks in the kernel or
one of the associated modules. In these cases, kmastat
can provide useful information pinpointing the source of the leak.
To check on kernel memory allocations on a running system, run a
crash session as follows:
# crash
dumpfile = /dev/mem, namelist = /dev/ksyms, outfile = stdout
> kmastat
The first number on the "Total" line represents the total amount of memory
allocated by the kernel. If this is a significant fraction of available
system memory and growing, there is a problem.
The output from the kmastat command also contains information
on a number of "buckets" or categories for memory allocation.
Additional information can be obtained via the kmausers
command, but this requires that we load kadb prior to booting. To
do this, reach the ok> prompt, then:
ok> boot kadb -d
kadb: (hit the "return" key)
kadb[0]: kmem_flags/W 01
kadb[0]: :c
Loading kadb this way means that kadb will only be effective for this
current boot session.
Once the system is up, we can either force a core dump via STOP-A/
ok> sync, or we can examine the live system. In either case,
inside the crash session we would type:
>kmausers bucket_name
The result will show memory allocations inside that bucket. The names
of functions inside each allocation will be a tip-off to what is actually
grabbing the memory allocation.
A script can be run from cron to capture this information. The format of
this script would be something like:
#!/bin/sh
date >> log_file
echo "kmastat" | /usr/sbin/crash -w log_file
sleep 20
echo "kmausers kmem_alloc_2048" | /usr/sbin/crash -w log_file
Slab Allocator
Solaris 2.4+ uses a kernel memory allocator known as a slab
allocator.
A kernel memory allocator performs the following functions:
- Allocate memory
- Initialize objects/structures
- Use objects/structures
- Deconstruct objects/structures
- Free memory
The structures in the memory objects include sub-objects such as
linked list headers, mutexes, reference counts and condition
variables. In the case of Solaris, the deconstruction step includes
setting objects to their initial settings, which can save time when
the memory objects have to be re-initialized.
A translation lookaside buffer (TLB) is an associative cache of
recent address translations. When the MMU (memory management unit)
cannot find a translation in the TLB, it lookus it up in the address
maps and loads the address into the TLB. Entries in the TLB are
replaced on a least recently used basis.
The slab allocator is organized as a collection of object caches.
Each of these caches contains only one type of object (proc structures,
vnodes, etc). The kernel is responsible for restoring each object
to its initial state when it is released. When a cache requires
additional space, the allocator gives a slab of memory from the
page-level allocator and creates objects from it. The slab contains
enough memory for several object instances. A small part of the
slab is used by the cache to manage memory in the slab; the rest
is divided into buffers that are the size of the object. The allocator
then initializes these buffers with the appropriate constructor.
When the page-level allocator needs to recover memory, unused
slabs are reaped by deconstructing the objects on slabs
whose objects are all free, then removing the slab from the cache
in question.
The structure for each slab includes unused space at the beginning
of the slab (coloring area), the set of objects, more unused space
(the amount left over after the maximum number of objects has been
created), and a slab data area. Each object also includes a four
byte area for a free list pointer. The slab data area includes a count
of in-use objects, pointers for a doubly-linked list of slabs in
the same cache, and a pointer to the first free area in the slab.
The coloring areas are different sizes for each slab in a cache
(where possible). This allows a balanced distribution of traffic
on the hardware caches and memory busses by varying the offsets for
the different slabs.
Large object slabs are slightly different in that management data
is stored in a separate pool of memory, since large slabs are usually
multiples of a page in size. A hash table is also maintained to
provide lookups between the management area and the slabs.