Solaris Processes
The process is one of the fundamental abstactions of Unix. Every object
in Unix is represented as either a file or a process. (With the
introduction of the /proc structure, there has been an
effort to represent even processes as files.)
Processes are usually created with
fork or a less resource intensive alternative such as
fork1 or
vfork. fork duplicates the entire process context,
while fork1 only duplicates the context of the calling thread.
This can be useful (for example), when exec will be called
shortly.
Solaris, like other Unix systems, provides two modes of operation: user
mode, and kernel (or system) mode. Kernel mode is a more privileged
mode of operation. Processes can be executed in either mode, but user
processes usually operate in user mode.
Per-process Virtual Memory
Each process has its own virtual memory space. References to real
memory are provided through a process-specific set of address
translation maps. The computer's Memory Management Unit (MMU) contains
a set of registers that point to the current process's address
translation maps. When the current process changes, the MMU must load
the translation maps for the new process. This is called a
context switch.
The MMU is only addressable in kernel mode, for obvious security
reasons.
The kernel text and data structures are mapped in a portion of each
process's virtual memory space. This area is called the kernel space
(or system space).
In addition, each process contains these two important kernel-owned
areas in virtual memory: u area and kernel stack. The u
area contains information about the process such as information about
open files, identification information and process registers. The
kernel stack is provided on a per-process basis to allow the kernel to
be re-entrant. (ie, several processes can be involved in the kernel,
and may even be executing the same routine concurrently.) Each
process's kernel stack keeps track of its function call sequence when
executing in the kernel.
The kernel can access the memory maps for non-current processes by using
temporary maps.
The kernel can operate in either process context or system (or
interrupt) context. In process context, the kernel has access to the
process's memory space (including u area and kernel stack). It can also
block the current process while waiting for a resource. In kernel
context, the kernel cannot access the address space, u area or kernel
stack. Kernel context is used for handling certain system-wide issues
such as device interrupt handling or process priority computation.
Additional information is available on the
Process Virtual Memory page.
Each process's context contains information about the process, including
the following:
- Hardware context:
- Program counter: address of the next instruction.
- Stack pointer: address of the last element on the stack.
- Processor status word: information about system state, with bits
devoted to things like execution modes, interrupt priority levels,
overflow bits, carry bits, etc.
- Memory management registers: Mapping of the address translation
tables of the process.
- Floating point unit registers.
- User address space: program text, data, user stack, shared memory
regions, etc.
- Control information: u area, proc structure, kernel stack, address
translation maps.
- Credentials: user and group IDs (real and effective).
- Environment variables: strings of the form
variable=
value.
During a context switch, the hardware context registers are stored in the
Process Control Block in the u area.
The u area includes the following:
- Process control block.
- Pointer to the proc structure.
- Real/effective UID/GID.
- Information regarding current system call.
- Signal handlers.
- Memory management information (text, data, stack sizes).
- Table of open file descriptors.
- Pointers to the current directory vnode and the controlling terminal
vnode.
- CPU useage statistics.
- Resource limitations (disk quotas, etc)
The proc structure includes the following:
- Identification: process ID and session ID
- Kernel address map location.
- Current process state.
- Pointers linking the process to a scheduler queue or sleep queue.
- Pointers linking this process to lists of active, free or zombie
processes.
- Pointers keeping this structure in a hash queue based on PID.
- Sleep channel (if the process is blocked).
- Scheduling priority.
- Signal handling information.
- Memory management information.
- Flags.
- Information on the relationship of this process and other
processes.
Kernel Services
The Solaris kernel may be seen as a bundle of
kernel threads. It uses synchronization primitives to prevent
priority inversion. These include mutexes, semaphores, condition
variables and read/write locks.
The kernel provides service to processes in the following four ways:
- System Calls: The kernel executes requests submitted by
processes via system calls. The system call interface invokes a special
trap instruction.
- Hardware Exceptions: The kernel notifies a process that
attempts several illegal activities such as dividing by zero or
overflowing the user stack.
- Hardware Interrupts: Devices use interrupts to notify the
kernel of status changes (such as I/O completions).
- Resource Management: The kernel manages resources via
special processes such as the pagedaemon.
In addition, some system services (such as NFS service) are contained
within the kernel in order to reduce overhead from context switching.
An application's parallelism is the degree of parallel execution
acheived. In the real world, this is limited by the number of
processors available in the hardware configuration. Concurrency is the
maximum acheivable parallelism in a theoretical machine that has an
unlimited number of processors. Threads are frequently used to increase
an application's concurrency.
A thread represents a relatively independent set of instructions within
a program. A thread is a control point within a process. It shares
global resources within the context of the process (address space, open
files, user credentials, quotas, etc). Threads also have private
resources (program counter, stack, register context, etc).
The main benefit of threads (as compared to multiple processes) is that the
context switches are much cheaper than those required to change current
processes. Sun reports that a fork() takes 30 times as
long as an unbound thread creation and 5 times as long as a bound thread
creation.
Even within a single-processor environment, multiple threads
are advantageous because one thread may be able to progress even though
another thread is blocked while waiting for a resource.
Interprocess communication also takes
considerably less time for threads than for processes, since global data
can be shared instantly.
A kernel thread is the entity that is scheduled by the kernel. If no
lightweight process is attached, it is also known as a system
thread. It uses kernel text and global data, but has its own
kernel stack, as well as a data structure to hold scheduling and
syncronization information.
Kernel threads store the following in their data structure:
- Copy of the kernel registers.
- Priority and scheduling information.
- Pointers to put the thread on the scheduler or wait queue.
- Pointer to the stack.
- Pointers to associated LWP and proc structures.
- Pointers to maintain queues of threads in a process and threads
in the system.
- Information about the associated LWP (as appropriate).
Kernel threads can be independently scheduled on CPUs. Context
switching between kernel threads is very fast because memory mappings
do not have to be flushed.
A lightweight process can be considered as the swappable portion of a
kernel thread.
Another way to look at a lightweight process is to think of them as
"virtual CPUs" which perform the processing for applications.
Application threads are attached to available lightweight processes,
which are attached to a a kernel thread, which is scheduled on the
system's CPU dispatch queue.
LWPs can make system calls and can block while waiting for resources.
All LWPs in a process share a common address space.
IPC (interprocess communication) facilities exist for coordinating
access to shared resources.
LWPs contain the following information in their data structure:
- Saved values of user-level registers (if the LWP is not active)
- System call arguments, results, error codes.
- Signal handling information.
- Data for resource useage and profiling.
- Virtual time alarms.
- User time/CPU usage.
- Pointer to the associated kernel thread.
- Pointer to the associated proc structure.
By default, one LWP is assigned to each process; additional LWPs are
created if all the process's LWPs are sleeping and there are additional
user threads that libthread can schedule. The programmer
can specify that threads are bound to LWPs.
Lightweight process information for a process can be examined with
ps -elcL.
User threads are scheduled on their LWPs via a scheduler in
libthread. This scheduler does implement priorities, but
does not implement time slicing. If time slicing is desired, it must be
programmed in.
Locking issues must also be carefully considered by the programmer
in order to prevent several threads from blocking on a single resource.
User threads are also responsible for handling of
SIGSEGV (segmentation violation) signals, since the kernel does not
keep track of user thread stacks.
Each thread has the following characteristics:
- Has its own stack.
- Shares the process address space.
- Executes independently (and perhaps concurrently with other threads).
- Completely invisible from outside the process.
- Cannot be controlled from the command line.
- No system protection between threads in a process; the programmer
is responsible for interactions.
- Can share information between threads without
IPC overhead.
Priorities
Higher numbered priorities are given precedence. The scheduling page contains additional information
on how priorities are set.
When a process dies, it becomes a zombie process. Normally, the parent
performs a wait() and cleans up the PID. Sometimes, the
parent receives too many SIGCHLD signals at once, but can
only handle one at a time. It is possible to resend the signal on
behalf of the child via kill -18 PPID. Killing the
parent or rebooting will also clean up zombies. The correct answer is
to fix the buggy parent code that failed to perform the
wait() properly.
Aside from their inherent sloppiness, the only problem with zombies is
that they take up a place in the process table.
Kernel Tunables
The following kernel tunables are important when looking at
processes:
- maxusers: By default, this is set to 2 less than the number
of Mb of physical memory, up to 1024. It can be set up to 2048 manually
in the
/etc/system file.
- max_nprocs: Maximum number of processes that can be
active simultaneously on the system. The default for this is
(16 x maxusers) + 10.
The minimum setting for this is 138, the maximum is 30,000.
- maxuprc: The default setting for this is
max_nprocs - 5. The minimum is 133, the maximum is . This
is the numberof processes a single non-root user can create.
- ndquot: This is the number of disk quota structures. The
default for this is
(maxusers x 10) + max_nprocs. The
minimum is 213.
- pt_cnt: Sets the number of System V ptys.
- npty: Sets the number of BSD ptys. (Should be set to
pt_cnt.)
- sad_cnt: Sets the number of STREAMS addressable devices.
(Should be set to
2 x pt_cnt.)
- nautopush: Sets the number of STREAMS autopush entries.
(Should be set to
pt_cnt.)
- ncsize: Sets DNLC size.
- ufs_ninode: Sets inode cache
size.
proc Commands
The proc tools are useful for tracing attributes of processes.
These utilities include:
- pflags: Prints the tracing flags, pending and held signals
and other
/proc status information for each LWP.
- pcred: Prints credentials (ie, EUID/EGID, RUID/EGID, saved UID/GIDs).
- pmap: Prints process address space map.
- pldd: Lists dynamic libraries linked to the process.
- psig: Lists signal actions.
- pstack: Prints a stack trace for each LWP in the process.
- pfiles: Reports
fstat, fcntl
information for all open files.
- pwdx: Prints each process's working directory.
- pstop: Stops process.
- prun: Starts stopped process.
- pwait: Wait for specified processes to terminate.
- ptree: Prints process tree for process.
- ptime: Times the command using microstate accounting; does
not time children.
These commands can be run against a specific process, but most of them
can also be run against all processes on the system. See the above-
referenced man page for details.