What a computer can actually do is actually very simple, such as calculating the sum of two numbers and finding an address in memory. These basic computer actions are called instructions. A program is a collection of such a series of instructions. Through the program, we can make the computer complete complex operations. Programs are stored as executable files most of the time. Such an executable file is like a recipe, and the computer can make delicious meals according to the recipe.
So, what is the difference between a program and a process?
A process is the concrete realization of a program. Only recipes are useless, and we always have to follow the instructions of recipes step by step before we can make dishes. The process is the process of executing the program, similar to the actual cooking process according to the recipe. The same program can be executed many times, and each time an independent space can be opened in the memory for loading, thus generating multiple processes. Different processes can also have their own independent IO interfaces.
An important function of the operating system is to provide convenience for the process, such as allocating memory space for the process and managing the related information of the process. It's like preparing a beautiful kitchen for us.
Take a look at the progress
First, we can use the $ps command to query the running processes, such as $ps -eo pid, comm, cmd. The following figure shows the execution results:
(-e means to list all processes, -o pid, comm, CMD means that we need pid, COMMAND, cmd information).
Each line represents a process. Each row is divided into three columns. The first column PID (process identification) is an integer, and each process has a unique PID to represent its own identification, and the process can also identify other processes according to this PID. The second column, COMMAND, is the abbreviation of this procedure. The third column CMD is the parameters brought by the corresponding program and runtime of the process.
(There are some in the third column, enclosed in brackets []. They are part of the kernel function, and they are dressed as processes to facilitate the management of the operating system. We don't have to consider them. )
Let's look at the first line. The PID is 1 and the name is init. This process is generated by executing the file /bin/init. When Linux starts, init is the first process created by the system, and this process will exist until we turn off the computer. This process is particularly important, and we will always mention it.
How to create a process
In fact, when the computer is turned on, the kernel only establishes an init process. The Linux kernel does not provide system calls to directly establish new processes. All other processes are established by init process through fork mechanism. The new process should copy itself through the old process, which is fork. Fork is a system call. This process exists in memory. Each process is allocated its own address space in memory. When a process forks, Linux creates a new memory space for the new process in memory, and copies the contents of the old process space into the new space, and then the two processes run at the same time.
The old process becomes the parent process of the new process, and correspondingly, the new process is the child process of the old process. In addition to PID, the process will also have a parent process PID (parent PID) stored by PPID. If we trace PPID, we will find that its source is the init process. Therefore, all processes also form a tree structure with init as the root.
As follows, we query the processes under the current shell:
The code is as follows:
root@vamei:~# ps -o pid,ppid,cmd
PID PPID CMD
16935 3 10 1 sudo-I
16939 16935 -bash
23774 16939 ps -o pid、ppid、cmd
We can see that the second process bash is a child process of the first process sudo, and the third process ps is a child process of the second process.
You can also use the pstree command to display the entire process tree:
The code is as follows:
init─┬─networkmanager─┬─dhclient
│ └─2*[{NetworkManager}]
├─accounts-daemon───{accounts-daemon}
├─acpid
├─apache2─┬─apache2
│└─2*[apache2───26*[{apache2}]]
├─at-spi-bus-laun───2*[{at-spi-bus-laun}]
├─atd
├─avahi-daemon───avahi-daemon
├─bluetoothd
├─colord───2*[{colord}]
├─console-kit-dae───64*[{console-kit-dae}]
├─cron
├─cupsd───2*[dbus]
├─2*[dbus-daemon]
├─dbus-launch
├─dconf-service───2*[{dconf-service}]
├─dropbox─── 15*[{dropbox}]
├─firefox───27*[{firefox}]
├─gconfd-2
├─geoclue-master
├─6*[getty]
├─gnome-keyring-d───7*[{gnome-keyring-d}]
├─gnome-terminal─┬─bash
│ ├─bash───pstree
│ ├─gnome-pty-helpe
│ ├─sh───R───{R}
│ └─3*[{gnome-terminal}]
Fork is usually called as a function. This function will return twice, returning the PID of the child process to the parent process and 0 to the child process. In fact, a child process can query its PPID at any time and know who its parent process is, so that a pair of parent processes and child processes can query each other at any time.
Usually after calling the fork function, the program will design an if selection structure. When PID is equal to 0, it means that the process is a sub-process, so let it execute some instructions, such as reading another program file with exec library function and executing it in the current process space (this is actually a great purpose of using fork: to create a process for a program); When PID is a positive integer, it means that it is the parent process and executes other instructions. Therefore, after the child process is established, it can be made to perform different functions from the parent process.
The termination of the child process.
When the child process terminates, it will notify the parent process, clear its occupied memory, and leave its own exit information in the kernel (exit code, 0 if it runs smoothly; > if there are errors or abnormal conditions; An integer of 0). In this message, it will explain why the process exited. When the parent process knows that the child process is finished, it has the responsibility to wait for the system call for the child process. This wait function can extract the exit information of the subprocess from the kernel and clear the space occupied by this information in the kernel. However, if the parent process terminates before the child process, the child process will become an orphan process. The orphan process will be adopted by the init process, and the init process will become its parent process. When the child process terminates, the init process is responsible for calling the wait function.
Of course, a bad program may also cause the exit information of the child process to stay in the kernel (the parent process does not call the wait function to the child process). In this case, the child process becomes a zombie process. When a large number of zombie processes accumulate, memory space will be squeezed.
Processes and threads
Although in UNIX, process and thread are related but different things, in Linux, thread is just a special process. Multiple threads can share memory space and IO interface. Therefore, the process is the only way to realize Linux programs.