Linux processes and threads
Table of Contents
Program #
- a set of instructions that uses internal and/or external data
- internal data: strings, integers, etc…
- external data: databases, files, etc…
Process vs thread #
Process #
- an executing instance of a program
- each process has its own resource
- the same program maybe executing more than once simultaneously
- every process has one or more thread
- multi-threaded process: multiple flow of execution, every resources are shared between threads
- heavy weight process: a process with its own resources
- light weight process: a thread of a process with shared resources with other threads
Thread #
- started from a process
- can share resources (eg.: memory)
- each thread is considered individually, as a standalone process
- each thread has the same process ID (thread group ID) and different thread ID
Processes in details #
- has attributes, eg.:
- command: command that executed to start the process
- state:
- running: executing or in the run queue
- sleeping: waiting on a request
- stopped: suspended
- zombie: exit state not properly handled
- PID: Process ID
- PPID: Parent Process ID
- PGID: Process Group ID
- environment: environment where the process running
- UID: which user called the process
- allocated memory
- init is the first process, the parent of all processes, except kernel initiated processes (marked with [] in
ps
) - if the parent dies before the child:
- without systemd: the child’s PPID will set to 1, init will be the parent
- with systemd: the PPID will set to 2, kthreadd will be the parent
- zombie process: the parent process is not programmed properly to handle the die of the child process, all resources are released, just the child are stick around in memory
- init process is kills gracefully every adopted zombie processes
- processes are controlled by Linux Process Scheduler, a component of the kernel which decides which process to run next
- the maximum number of PIDs can be viewed or modified in
/proc/sys/kernel/pid_max
- when the PID number reaches the maximum, the kernel starts to assign from the released PIDs from 300
- context: information about the process (eg.: state of the CPU, environment, content of the memory, etc…)
- setuid programs: marked with an “s” execute bit, runs the program as the owner of the program, not the caller (eg.: sudo, passwd)
- processes do not have direct access to the hardware, they have to use system calls
- execution modes:
- Ring 3/user mode:
- lesser privileges
- isolated process resources (eg.: not able to read/write other process’s memory)
- programs with root privileges also run in user mode
- limited access to hardwares: processes has to use system calls to access the hardware
- applications never runs in kernel mode
- Ring 0/system/kernel mode:
- full access to hardwares
- only kernel codes are running in Ring 0
- Ring 3/user mode:
- daemon:
- process running in the background
- mostly used to provide services to the user
- only running when needed
- mostly ends with a “d” (eg.: sshd)
- may respond to events (systemd-udevd) or ellapsed time (crond)
- often do not have controlling terminal or I/O devices
- provides better security control
- managed by a deamon manager (eg.: systemd, OpenRC, SysV)
- daemon scripts stored under
/etc/init.d
fork()
: creates a new process (child) by duplicating the calling processexec()
: replace the existing process with the new process- a command executed in a shell (foreground process):
- a new shell forked from the shell
- puts the parent shell to sleep, waiting for the child to exit
- replacing the parent shell with the child shell
- command loaded onto the child with exec, replaces the child shell
- the command exit with
exit()
system call, the child process dies - the parent shell re-awakened
- everything starts from the beginning
- background process: add an “&” to the end of the command, child process forked to the background, the parent shell and the child process running parallel
- shell’s built in commands (eg.:
echo
) do not uses the above method, because no program files involved - process priority:
- nicer processes runs later
- use
nice
andrenice
commands to set priority - priority is used when no resource available, and the scheduler needs to select the next process
- static libraries: libraries built into the code at compile time
- shared libraries / Dynamic Link Libraries:
- librarties used in runtime
- more efficient than static libraries (eg.: better load time)
- suffix/extension: .so
- carefully versioned
ldd
command is used to list the shared libraries needed to a binaryLD_LIBRARY_PATH
environment variable is used to search libraries here beforeldconfig
’s cache
- Inter Process Communication (IPC):
- how processes can communicate with each other
- shared memory
- message passing
ipcs
is used to list these IPC facilities
- how processes can communicate with each other
Reference #
- LFS201 (this is just my note from LFS201, not meant to replace the original, which is explain better the above)