Operating System Summary

Operating System Summary#

I. Overview#

Concurrency and Parallelism#

Concurrency: A single processor handles multiple tasks simultaneously, where "simultaneously" is a macro concept, referring to tasks being executed alternately within a certain time period.
Parallelism: Multiple processors or multi-core processors handle multiple different tasks simultaneously, where "simultaneously" means truly executing multiple tasks at the same moment.

Sharing refers to resources in the system being used by multiple concurrent processes. There are two types of sharing: mutual exclusion sharing and simultaneous sharing. Resources that are mutually exclusive are called critical resources, such as printers, where only one process is allowed to access them at the same time, requiring synchronization mechanisms to manage access to critical resources.

Virtualization#

Virtualization technology converts a physical entity into multiple logical entities. There are mainly two types of virtualization technologies: time division multiplexing and space division multiplexing. For example, multiple processes can concurrently execute on the same processor using time division multiplexing, allowing each process to take turns occupying the processor, executing only a small time slice each time and quickly switching.

Synchronous and Asynchronous#

Synchronous means the entire processing process is executed in order, and results are returned only after all processes have completed. It is a linear execution method, and the execution flow cannot cross. It is generally used in programs with strong procedural requirements, such as user login, where the system cannot log in until user verification is complete.
Asynchronous means that only the call instruction is sent, and the caller does not need to wait for the called method to complete; instead, it continues executing the subsequent processes. It is a form of parallel processing, where there is no need to wait for one program to finish executing before executing other tasks, such as in the process of loading page data, where the page does not need to wait for all data to be retrieved before displaying.

Kernel Mode and User Mode#

Kernel Mode: The CPU can access all data in memory, including peripheral devices such as hard drives and network cards. The CPU can also switch itself from one program to another.
User Mode: Limited access to memory is allowed, and access to peripheral devices is not permitted. The ability to occupy CPU resources is deprived, and CPU resources can be obtained by other programs.
There are three ways for user mode to switch to kernel mode: system calls, exceptions, and interrupts from peripheral devices.

II. Processes and Threads#

Process#

A process is an instance of a program in execution, with its own program counter and internal state. It is an independent unit for resource allocation and scheduling in the system. The Process Control Block (PCB) describes the basic information and running state of the process. The creation and termination of processes refer to operations on the PCB.

Thread#

A thread is the basic unit of independent scheduling. A process can have multiple threads that share the process's resources. For example, a browser process contains many threads, such as HTTP request threads, event response threads, rendering threads, etc. The concurrent execution of threads allows the browser to respond to other user events while clicking a new link to initiate an HTTP request.

Differences#

Resource Ownership
A process is the basic unit of resource allocation, but a thread does not own resources; it can access resources belonging to its process.
Scheduling
A thread is the basic unit of independent scheduling. In the same process, switching between threads does not cause a process switch. Switching from a thread in one process to a thread in another process will cause a process switch.
System Overhead
When creating or terminating a process, the system must allocate or reclaim resources such as memory space and I/O devices, which incurs much greater overhead than creating or terminating threads. Similarly, during process switching, it involves saving the CPU environment of the currently executing process and setting up the CPU environment of the newly scheduled process, while thread switching only requires saving and setting a small number of register contents, resulting in minimal overhead.
Communication
Inter-process communication (IPC) requires synchronization and mutual exclusion mechanisms to ensure data consistency. Threads can communicate by directly reading/writing data segments (such as global variables) within the same process.

Process States#

Ready State: The process has acquired all resources outside the CPU and can execute immediately once CPU resources are allocated.
Running State: The process has acquired CPU resources and is executing.
Blocked State: When the program lacks sufficient conditions, it must wait until the conditions are met to execute, such as waiting for I/O operations; this state is called the blocked state.

Thread States#

New: A thread is created within a process; it can be derived from either a process or another thread.
Blocked: If a thread needs to wait for an event during execution, it is blocked.
Unblocked: If the event that blocked the thread occurs, the thread is activated and enters the ready queue.
Scheduled: A ready thread is selected to enter the execution state.
Finished: If a thread completes execution, its register context and stack contents will be released.

Thread Implementation Methods#

User-Space Thread Implementation
User-level threads are implemented in user programs without kernel support. They do not depend on the operating system core, and application processes use thread library functions to create, synchronize, schedule, and manage threads. There is no need for user mode/kernel mode switching, resulting in fast execution. The operating system kernel is unaware of the existence of multiple threads, so if one thread blocks, the entire process (including all its threads) will block. Since processor time slices are allocated based on processes, the execution time of each thread is relatively reduced.
Kernel-Level Thread Implementation
Kernel-level threads are created and terminated by the operating system kernel. The kernel maintains context information for processes and threads, as well as thread switching. A kernel thread blocked due to I/O operations does not affect the execution of other threads.
Advantages of User-Space Thread Implementation
1. Can be implemented in operating systems that do not support threads.
2. The cost of creating and destroying threads, as well as thread switching, is much lower than that of kernel threads.
3. Allows each process to customize its own scheduling algorithm, making thread management more flexible.
4. Threads can utilize more table space and stack space than kernel-level threads.
Disadvantages of User-Space Thread Implementation
1. Only one thread can run at a time within the same process. If one thread uses a system call and blocks, the entire process will be suspended.
2. Page faults can also cause the entire process to be suspended.
3. The advantages and disadvantages of kernel threads are exactly the opposite of user threads. In practice, the operating system can use a hybrid approach to implement threads.

Coroutine#

Coroutines are a lighter-weight alternative to threads. Just as a process can have multiple threads, a thread can also have multiple coroutines. Importantly, coroutines are not managed by the operating system kernel but are entirely controlled by the program (executed in user mode). This brings significant performance benefits, as switching between coroutines does not consume resources like thread switching does. Essentially, they are threads in user space (i.e., there is no context switching between user and kernel address spaces, but coroutines run within a single thread).

Coroutine Applications#

Lua has used coroutines since version 5.0, implemented through the coroutine extension library.
Python can implement coroutines using yield/send. After Python 3.5, async/await became a better alternative.
Go language has a powerful and concise implementation of coroutines, allowing the easy creation of hundreds or thousands of coroutines to run concurrently.
Java does not have native support for coroutines, but some open-source frameworks simulate coroutine functionality, such as the Kilim framework source code.

Questions and Thoughts#

There is too much mixed information online about the states of processes and threads, making it difficult to understand their relationship. Currently, in my understanding, processes should also include new and finished states, which is similar to threads. I believe only running, ready, and blocked states can be considered basic states, while others like new, finished, and awakened should only be considered as operations for state transitions.
In Java, threads also have a waiting state, which is just a more detailed division of the blocked state into BLOCKED and WAITING. The difference is that waiting is active waiting, while blocking is passive blocking. In the waiting state, the thread can only enter the ready state if it is actively awakened, while in the blocked state, the thread itself is continuously contending for a lock. Once it acquires the lock, it will switch to the ready state without needing to be actively awakened by others.

Reference Links#

III. Inter-Process Communication Methods#

Pipe#

A pipe is a shared file that connects a reading process and a writing process for data exchange.

It is half-duplex (i.e., data can only flow in one direction) and has fixed read and write ends; a pipe only has meaning when both read and write ends exist.
When the read end finds the pipe empty, it needs to sleep and wait until data is available to be awakened. Similarly, the write end waits when the pipe is full until it is awakened.
It can be considered a special file, and ordinary read, write, etc., functions can be used for its read and write operations. However, it is not an ordinary file and does not belong to any other file system; it only exists in memory.
Mutual exclusion: Only one process can read or write to the pipe at any given time.
There are unnamed pipes and named pipes; the former is used for communication between parent and child processes, while the latter is used for communication between any two processes running on the same machine.

Message Queue#

A message queue is a linked list of messages stored in the kernel. A message queue is identified by an identifier (i.e., queue ID).

Message queues are record-oriented, where messages have specific formats and priorities.
Message queues are independent of sending and receiving processes; when a process terminates, the message queue and its contents are not deleted.
Message queues can implement random querying of messages; messages do not necessarily have to be read in first-in-first-out order and can also be read by message type.

Signal#

A more complex communication method used to notify the receiving process that a certain event has occurred. It is the only asynchronous communication mechanism in inter-process communication.

Semaphore#

Unlike the IPC structures already introduced, it is a counter. Semaphores are used to implement mutual exclusion and synchronization between processes, rather than to store inter-process communication data.

Semaphores are used for inter-process synchronization; if data needs to be passed between processes, shared memory must be combined.
Semaphores are based on the PV operations of the operating system, and operations on semaphores are atomic operations.
Each PV operation on a semaphore is not limited to adding or subtracting 1 from the semaphore value; it can add or subtract any positive integer.
Supports semaphore groups.
Note⚠️: Semaphores and signals are not the same concept; there is a significant difference between the two.

Shared Memory#

Refers to two or more processes sharing a given storage area.

Shared memory is the fastest form of IPC because processes directly access memory.
Since multiple processes can operate simultaneously, synchronization is required.
Semaphores and shared memory are usually used together, with semaphores synchronizing access to shared memory.

Socket#

A socket is also a mechanism for inter-process communication that can achieve communication between processes on different hosts. A socket can be seen as an endpoint for inter-process communication, with each socket having a unique name that other processes can access, connect to, and communicate data.

Summary#

Pipe: Slow speed, limited capacity.
Message Queue: Capacity is limited by the system, and care must be taken when reading for the first time to consider any previously unread data.
Semaphore: Cannot pass complex messages, only used for synchronization.
Shared Memory: Can easily control capacity, fast speed, but synchronization must be maintained. For example, when one process is writing, another process must pay attention to read/write issues, similar to thread safety. Of course, shared memory can also be used for communication between threads, but it is unnecessary since threads already share a block of memory within the same process.

IV. Synchronization Mechanisms#

Background#

Concurrent execution of multiple processes (or threads) can lead to situations where processes mutually restrict each other. For example, two processes may need to:
1. Share a unique hardware device.
2. Share the same memory area.
3. The execution of one process depends on the execution results of another process on shared resources.

If there is a temporal relationship between multiple processes that need to work together to complete a task, it is called synchronization; if the conditions for collaboration are not met, and the relationship arises solely from sharing exclusive resources, it is called mutual exclusion. Mutual exclusion is a special form of synchronization.

For synchronization mechanisms, the following four rules must be followed:
1. Free to enter: Any process can enter when there are no processes/threads in the critical section.
2. Busy waiting: When there are processes/threads in the critical section, other processes cannot enter the critical section.
3. Bounded waiting: Processes/threads waiting to enter the critical section cannot wait indefinitely.
4. Yielding (optional): Processes/threads that cannot enter the critical section should release the CPU, such as switching to a blocked state.

Semaphore#

Semaphore and PV primitive operations were invented by Dijkstra and are one of the most widely used mutual exclusion methods.
The principle of semaphores and PV operations:
1. When a process wants to enter a shared area, it first executes the P operation, S-1.
2. When a process wants to exit the shared area, it executes the V operation, S+1.
3. The operations of processes entering and exiting the shared area are atomic operations (the execution process cannot be interrupted).
Defect: The readability of the program is relatively poor, and the management of semaphores is scattered among the participating objects, which may lead to deadlocks, process starvation, and other issues.

Monitor (Unique to Process Synchronization)#

A monitor is an extension and improvement of the semaphore mechanism, providing a simpler synchronization method.
A monitor is an object or module that can be safely accessed by multiple processes/threads.
The methods within a monitor are protected by a mutex, meaning that only one visitor is allowed to use them at any given time.
Characteristics: Safety, Mutual Exclusion, Sharing.

Critical Section (Thread Synchronization)#

Accessing shared resources or a segment of code through serialization of multiple threads is fast and suitable for controlling data access. At any given time, only one thread is allowed to access shared resources. If multiple threads attempt to access shared resources, once one thread enters, the other threads attempting to access shared resources will be suspended and will wait until the thread in the critical section exits. Only after the critical section is released can other threads preempt it.

Mutual Exclusion#

Mutual exclusion can achieve safe sharing of common resources not only within the same application but also between different applications. Mutexes are more complex than critical sections. Using a mutex can achieve safe sharing of resources not only among different threads within the same application but also among threads of different applications.

Event (Thread Synchronization)#

Maintaining thread synchronization through notification operations also facilitates operations for comparing the priorities of multiple threads.

Differences in Synchronization Mechanisms Between Processes and Threads#

Semaphores, monitors, and mutexes are synchronization mechanisms for processes, while semaphores and mutexes can also be used for thread synchronization, but monitors are only used in process synchronization.
In addition to semaphores and mutexes, thread synchronization also includes critical sections and events. There is no mention of these two methods being used as synchronization methods for processes (unknown).

Classic Synchronization Problems#

Producer-Consumer Problem
Dining Philosophers Problem
Reader-Writer Problem

For specific references:

Reference Links#

V. Deadlock#

Deadlock refers to a situation where multiple processes are in a stalemate during execution due to resource contention. If no external force intervenes, processes in a deadlock cannot continue executing.

Causes of Deadlock#

Insufficient system resources.
Improper order of process execution.
Improper resource allocation, etc.

Four Necessary Conditions for Deadlock#

Mutual Exclusion Condition: Processes exclusively use the resources allocated to them.
Hold and Wait Condition: Processes do not release the resources they have acquired while being blocked.
No Preemption Condition: Resources already acquired by a process cannot be forcibly taken away until they are used up.
Circular Wait Condition: A circular wait chain exists among processes and resources when deadlock occurs.

Deadlock Handling#

Preventing Deadlock: Break one or more of the four necessary conditions for deadlock; this is relatively simple to implement, but if the restrictions are too strict, it may reduce system resource utilization and throughput.
Avoiding Deadlock: Prevent the system from entering an unsafe state (a state that may lead to deadlock) during dynamic resource allocation, such as the Banker's algorithm. Refer to Banker's Algorithm.
Detecting Deadlock: Allow the system to run and potentially enter deadlock; after deadlock occurs, use certain algorithms to detect it and identify the resources and processes involved in the deadlock, taking relevant measures to eliminate the detected deadlock. This is more challenging to implement.
Resolving Deadlock: In conjunction with deadlock detection, free the system from deadlock (by terminating processes or preempting resources). For processes and resources detected as being involved in deadlock, release some resources through termination or suspension, allowing them to be allocated to processes in a blocked state, transitioning them to the ready state. This is also challenging to implement.

VI. Scheduling Algorithms#

`First-Come, First-Served (FCFS)`#

Can be used as both job scheduling and process scheduling; schedules jobs or processes in the order they arrive; therefore, it is more favorable for long jobs.

`Shortest Job First (SJF)`#

A job scheduling algorithm that selects the job with the shortest estimated time from the ready queue for processing until a result is obtained or it cannot continue executing; disadvantage: unfavorable for long jobs; does not consider job importance; running time is estimated and unreliable.

`Highest Priority First (HPF)`#

Can be used as both job scheduling and process scheduling; when scheduling jobs, it selects the job with the highest priority from the ready queue for processing; since it involves priority, it can be divided into preemptive and non-preemptive; the determination of priority can also be divided into static priority (a fixed value determined in advance based on process type, resource demands, user requirements, etc.) and dynamic priority (which increases or decreases with the progress of the process or waiting time).

`Highest Response Ratio Next (HRRN)`#

FCFS may cause dissatisfaction among short job users, and SJF may cause dissatisfaction among long job users, leading to the proposal of HRRN, which selects the job with the highest response ratio for execution. Response ratio = 1 + (job waiting time / job processing time).

`Round Robin`#

Processes are placed in a queue in the order they arrive, and the CPU time slice is allocated to the first process in the queue. When the time slice expires, a timer interrupt occurs, pausing the current process and placing it at the end of the queue, repeating the cycle.

`Multilevel Feedback Queue`#

Currently recognized as a better scheduling algorithm; multiple ready queues are set up, each with different priorities. The first queue has the highest priority, and the others decrease in priority. Higher priority queues are allocated shorter time slices. When a process arrives, it is placed in the first queue according to FCFS. If it does not complete after scheduling, it is placed at the end of the second queue for waiting. If it still does not complete after the second scheduling, it is placed at the end of the third queue, and so on. Only when the previous queue is empty will the next queue's processes be scheduled.

VII. Memory Management#

Basic Knowledge#

Linking#

Generating load modules to form a complete logical address.
Three linking methods:

Static Linking: Linking before loading.
Dynamic Linking at Load Time: Linking while loading.
Dynamic Linking at Runtime: Linking as needed during runtime.

Purpose: To determine the complete logical address.

Loading#

Loading into memory to form a physical address.

Three loading methods:

Absolute Loading: Generates absolute addresses at compile time, single job environment.
Static Relocation: Converts logical addresses to physical addresses at load time, multi-job batch processing operating systems.
Dynamic Relocation: Converts logical addresses to physical addresses at runtime, requiring the setting of relocation registers, modern operating systems.

Purpose: To determine the final physical address.

Fragmentation#

Internal Fragmentation (allocated): Memory areas allocated to a process where some parts are not used.
External Fragmentation (unallocated): Some free partitions in memory are too small to be utilized.

Contiguous Allocation Management Methods#

Single Contiguous Allocation: Memory is divided into system and user areas, allowing only one user program (no external fragmentation, but internal fragmentation exists).
Fixed Partition Allocation: The entire user space is divided into several fixed-size partitions, which can be equal or unequal. Each partition can only load one job, managed by a partition description table (no external fragmentation, but internal fragmentation exists).
Dynamic Partition Allocation: Partitions are dynamically established based on process size, using free partition tables or free partition chains for management. The dynamic partition algorithm selects suitable free partitions for allocation (external fragmentation exists, no internal fragmentation).

Dynamic Partition Allocation Algorithms#

`First Fit Algorithm (FF)`#

When allocating memory, it starts searching from the head of the free partition table until it finds the first free partition that can satisfy the size requirement. This algorithm is simple to implement and prioritizes using lower address free partitions, thus preserving larger free partitions in higher addresses for later allocation of large jobs. However, the continuous allocation of lower address partitions leaves many small, unusable free partitions, and each search starts from the lower address, which undoubtedly increases search overhead.

`Next Fit Algorithm (NF)`#

Unlike the first fit algorithm, when allocating memory space for a process, it does not always start searching from the lower address but begins from the next free partition found during the last allocation. Therefore, a search pointer needs to be set to indicate the starting free partition for the next search. If the last free partition is still too small to meet the requirement, it should return to the first free partition. This algorithm makes the distribution of free partitions more even and appropriately reduces the overhead of searching for free partitions, but it can easily split larger free partitions.

`Best Fit Algorithm (BF)`#

Each time memory is allocated for a job, the smallest free partition that meets the requirement is allocated to the process, i.e., finding a free partition that is the smallest but meets the requirement. This avoids "wasting large resources on small jobs," but it can easily lead to small fragments.

`Worst Fit Algorithm (WF)`#

In contrast to the best fit algorithm, it always selects the largest free partition that meets the requirement for allocation to the process, i.e., finding a free partition that is the largest but meets the requirement. This minimizes the likelihood of fragmentation and is beneficial for medium and small jobs.

Non-Contiguous Allocation Management Methods#

Basic Paging Storage Management#

Basic paging storage management does not have page replacement functionality (i.e., it does not implement virtual memory). Therefore, the entire program's pages must be loaded into memory before it can run. Since program data is stored in different pages, and pages are distributed discretely in memory, a page table is needed to record the mapping relationship between logical addresses and actual storage addresses to achieve mapping from page numbers to physical block numbers. Since the page table is also stored in memory, accessing data in a paging system requires two memory accesses (one to access the page table in memory to find the specified physical block number, plus the page offset to obtain the actual physical address; the second is to access memory based on the first obtained physical address to retrieve data).

To reduce the efficiency impact caused by two memory accesses, paging management introduces the Translation Lookaside Buffer (TLB) mechanism (similar to caching mechanisms). In memory management with TLB, when accessing memory data, the page number is first queried in the TLB. If found, it indicates that the page table entry to be accessed is in the TLB, and the corresponding physical block number is read directly from the TLB. If not found, the page table in memory is accessed to obtain the physical address, while adding that mapping table entry from the page table to the TLB, which may involve TLB replacement algorithms.

Advantages and disadvantages: No external fragmentation, high memory utilization. However, the contents of each page are not related, making programming and sharing less convenient.

Supplementary Concepts:

Page Frame = Memory Block = Physical Block = Physical Page (memory partition).
Page = Logical Address Space Partition (process).
Pages and page frames correspond one-to-one (page table, page table entries consist of "page number" and "block number," with the page number not occupying memory space).
Page Table Length refers to the total number of page table entries in the page table, i.e., the total number of pages.
Page Table Entry Length refers to how much storage space each page table entry occupies.
Page Size refers to how much storage space a page occupies.

Basic Segmentation Storage Management#

Programs are divided into segments based on content or procedural relationships, with each segment having its own name. A user job or process containing segments corresponds to a two-dimensional linear virtual space, i.e., a two-dimensional virtual memory. Segmentation management allocates memory in segments and then converts segment virtual addresses to actual physical addresses through address mapping mechanisms.

Process Address Space: Divided into several segments according to the logical relationships of the program, with each segment having a segment name and starting from 0.
Logical Address Structure: Segment number and segment offset (offset within the segment).
Memory Allocation Rules: Allocated in segments, with each segment occupying contiguous space in memory, and segments not necessarily adjacent.
Segment table entries consist of segment number (implicit, not occupying storage space), segment length, and base address.
The difference from page table address conversion is that it also checks whether the segment offset exceeds the segment length.

Advantages and disadvantages:

Different types of segments can have different protections.
Segments can be shared, including through dynamic linking for code sharing.
No internal fragmentation occurs, but external fragmentation can occur, leading to lower memory utilization compared to paging.

Differences Between Segmentation and Paging#

A page is a physical unit of information, a discrete allocation mechanism proposed from the perspective of system memory utilization; a segment is a logical unit of information, containing a complete set of meaningful information, proposed from the user's perspective for memory management.
The size of a page is fixed and determined by the system; the size of a segment is uncertain and determined by the user.
The logical address structure of paging is one-dimensional (offset), while segmentation is two-dimensional (segment name + offset).

Segment-Paging Storage Management#

After dividing the process into logical modules, each segment is then paged, which does not produce external fragmentation but only a small amount of internal fragmentation.
Logical Address Structure (two-dimensional): Segment number, page number, page offset + page number, block number where the page is stored.

Address transformation: Obtain segment number, page number, and page offset from the logical address --> Compare the segment number with the segment length recorded in the segment table register to check for out-of-bounds --> Find the corresponding segment table entry based on the segment table starting address and segment number --> Check the page number against the page table length recorded in the segment table to check for out-of-bounds --> Query the page table entry based on the page table address and page number --> Obtain the physical address from the block number and page offset --> Access the target unit.

In simple terms, the process involves three memory accesses: checking the segment table --> checking the page table --> accessing the target unit (if a TLB is set up, it can achieve one memory access).

Memory Space Expansion#

Overlay Technique#

The overlay technique includes a fixed area (where program segments do not load in and out during execution) and several overlay areas (where program segments need to load in and out during execution), increasing the user's programming burden transparently and occurring within the same program or process.

Swapping Technique#

The swapping technique involves swapping out certain processes to free up memory space when memory is tight, and then swapping in certain processes. The external storage disk is divided into file areas and swap areas, with swapped-out processes placed in the swap area, occurring between different processes (or jobs).

Virtual Storage Technology (Virtual Memory)#

The principle of virtual memory usage: Locality Principle (similar to the reason for using B-trees as index data structures in MySQL).
1. Temporal Locality: Instructions and data currently accessed will be accessed again soon.
2. Spatial Locality: Memory units surrounding the currently accessed memory space are likely to be accessed soon.
3. Cache Technology: Frequently used data is placed in faster storage.

Based on the locality principle, when loading a program, part of the program can be loaded into memory while leaving the rest in external storage, allowing the program to start executing. During program execution, when the accessed information is not in memory, the operating system loads the required part into memory and continues executing the program. On the other hand, the operating system swaps out temporarily unused content from memory to external storage, freeing up space for information that will be loaded into memory. This way, the system appears to provide the user with a much larger storage space than the actual memory, referred to as virtual storage.

Definition of Virtual Memory: Programs can run without being fully loaded, dynamically loaded during execution, and swapped out when memory is insufficient.
Characteristics of Virtual Memory:
1. Multiplicity: There is no need to load the entire job into memory at once during execution; it can be divided into multiple loads.
2. Swapping: There is no need for the job to remain resident in memory throughout its execution; it can be swapped in and out during execution.
3. Virtuality: Logically expands memory capacity, and the memory capacity seen by the user is much larger than the actual capacity.
Implementation of Virtual Memory Technology: Demand Paging Storage Management, Demand Segmentation Storage Management, Demand Segment-Paging Storage Management.
Costs of Using Virtual Memory:
1. Management of virtual memory requires establishing many data structures, occupying additional memory.
2. The conversion from virtual addresses to physical addresses increases instruction execution time.
3. Paging in and out requires disk I/O, consuming time.
4. If only part of a page contains data, memory is wasted.

Page Replacement Algorithms#

`Optimal Page Replacement Algorithm (OPT)`#

An algorithm that only has theoretical significance, used to evaluate other page replacement algorithms. The replacement strategy is to replace the page that will not be accessed for the longest time in the future.

`First-In, First-Out Page Replacement Algorithm (FIFO)`#

The operating system maintains a linked list of all pages currently in memory, with the most recently entered page at the tail and the earliest entered page at the head. When a page fault occurs, the page at the head is evicted, and the newly loaded page is appended to the tail. This is a simple and crude replacement algorithm that does not consider page access frequency information.

`Least Recently Used Algorithm (LRU)`#

The algorithm assigns each page an access field to record the time t since the page was last accessed. Each time a replacement occurs, the page with the largest t value is replaced (implementation can use registers or stacks).

`Clock Replacement Algorithm (also known as Not Recently Used Algorithm NRU)`#

Each page is set with an access bit, which is set to 1 when the page is accessed. All pages are stored in a circular queue, with a pointer pointing to the oldest page. During page replacement, if the page pointed to by the pointer has an access bit of 0, it is replaced; otherwise, the access bit is set to 0, and the pointer cycles until it encounters a page with an access bit of 0.

`Improved Clock Algorithm`#

Based on the Clock algorithm, an additional modified bit is added, and the replacement is determined based on both the access bit and the modified bit. Pages with both access and modified bits set to 0 are prioritized for replacement, followed by pages with an access bit of 0 and a modified bit of 1.

`Least Frequently Used Algorithm (LFU)`#

Registers are set to record the number of times a page has been accessed. Each time a replacement occurs, the page with the least access count is replaced. The problem is that the access register does not accurately reflect the current page access count because access speed is fast, so accessing once and accessing 100 times within the time interval for updating the register are treated the same. Additionally, LFU and LRU are very similar, and the supporting hardware is also the same, but the key distinction is that one is based on time and the other on count (for example, for registers pa 001111 and pb 111000, if using LRU, pa will be evicted; if using LFU, pb will be evicted).

`Page Buffering Algorithm (PBA)`#

During replacement, pages, regardless of whether they have been modified, are not swapped to disk but are first temporarily held in a linked list of pages in memory (modified and unmodified page lists can be distinguished or not). When they are accessed again, they can be retrieved directly from these lists without requiring disk I/O. When the number of modified pages in the list reaches a certain threshold, write operations to disk are performed sequentially (effectively merging multiple I/O operations into one).

Page Allocation Strategies#

Resident Set: Refers to the collection of physical blocks allocated to a process in demand paging storage management (i.e., the number of physical blocks allocated to the process by the system).

Fixed Allocation Local Replacement: The resident set is allocated before the process runs, and when a page fault occurs, only a page from the process itself can be swapped out.

Variable Allocation Global Replacement: Whenever a page fault occurs, a new physical block is allocated, which may come from free physical blocks or require swapping out pages from other processes.

Variable Allocation Local Replacement: Dynamically increases or decreases the physical blocks allocated to a process based on the frequency of page faults.

When to Load Pages: Pre-load strategy (before running); demand load strategy (during running).

Where to Load Pages: UNIX method: Pages that are used for the first time are loaded from the file area, and swapped-out pages are written back to the swap area, and when used again, they are loaded from the swap area.

Thrashing: Caused by insufficient physical blocks allocated to a process, where a page just swapped out is immediately swapped back in, and a page just swapped in is immediately swapped out. If this is due to a page replacement strategy error, the replacement algorithm can be modified to resolve the issue. If it is due to too many running programs causing the inability to load all frequently accessed pages into memory simultaneously, the number of concurrent programs should be reduced; otherwise, the process should be terminated or the resident set size increased.

Working Set: Refers to the set of pages that a process actually accesses within a certain time interval (the size of the resident set should generally not be less than the size of the working set to prevent thrashing).

Supplement:

Page Fault (or Page Error): When a program references a part of the address space not in physical memory, the operating system is responsible for loading the missing part into physical memory and re-executing the failed instruction.
An example problem calculating page faults.
Belady's Anomaly: When using the FIFO algorithm, if a process is not allocated all the pages it requires, there may be an anomaly where increasing the number of allocated pages leads to an increased page fault rate.

VIII. File System#

Disk Scheduling Algorithms#

First-Come, First-Served (FCFS)
Shortest Seek Time First (SSTF): Allows the request closest to the current track to access the disk drive first, i.e., the job with the shortest seek time is executed first, regardless of the order of arrival of requests, thus overcoming the issue of excessive movement of the disk arm in the FCFS scheduling algorithm.
Scan Algorithm (SCAN) or Elevator Scheduling Algorithm: Always starts from the current position of the disk arm, selecting the request closest to the current disk arm in the direction of movement. If there are no requests in the direction of movement, the direction of the disk arm is reversed. In this scheduling method, the movement of the disk arm is similar to that of an elevator, hence it is also called the elevator scheduling algorithm.
Circular Scan Algorithm (CSCAN): The circular scan scheduling algorithm is an improvement on the scan algorithm. The disk arm moves in a single direction, from the outside to the inside. Starting from the current position, it selects the request closest to the current disk arm in the direction of movement. If there are no requests in that direction, it returns to the outermost position and accesses the job request with the smallest cylinder number.
N-Step SCAN Algorithm: The N-step SCAN algorithm divides the disk request queue into several subsequences of length N, and the disk scheduling processes these sub-queues in FCFS order, while each sub-queue is processed according to the SCAN algorithm. When N is large, the algorithm approaches the SCAN algorithm; when N=1, it degenerates into the FCFS algorithm. The N-step SCAN algorithm effectively avoids the "disk arm sticking" phenomenon.

IX. I/O Models#

Reference: I/O in Operating Systems and High-Performance I/O Models

I/O Between Devices#

Polling Mode
The CPU actively polls various devices to check their status; if there is data, it performs I/O.
Interrupt Mode
When a device has data, it sends an interrupt, and the CPU decides whether to respond to the interrupt, then interrupts to handle the device's I/O. The CPU does not need to poll the device status frequently; it just passively receives interrupts.
Direct Memory Access (DMA) Mode
If one byte of data interrupts once, transferring 1KB of data would require 1024 interrupts, which wastes CPU time. Thus, DMA mode was developed, where the CPU only needs to provide the starting and ending addresses to DMA, and DMA handles the data I/O in between. The CPU is freed from byte-level intervention to block-level intervention.
Channel Control Mode
DMA can only control one device's data at a time; for multiple blocks of data, the CPU still needs to intervene multiple times. Thus, channels were developed to control I/O, which are more powerful than DMA, capable of controlling multiple blocks of data and I/O for multiple devices, further freeing the CPU from involvement in the I/O process.

I/O Between User Processes#

Blocking
User threads call certain system functions to retrieve data from the kernel, and the thread remains in a blocked state until the data reaches the kernel cache.
Non-Blocking
User threads attempt to retrieve data regardless of whether the kernel cache has data, returning directly, which may or may not yield data without putting the thread into a blocked state.
I/O Multiplexing
Multiplexing means one thread manages multiple I/Os. The thread is still blocked during the call, and when one or several I/Os have data, it returns. The thread needs to traverse all I/Os to determine which one has data. For example, the select() function of a socket allows the thread to call select() and enter a blocked state; when any I/O has data, the thread exits the blocked state and gets the opportunity to continue execution.
Signal-Driven I/O
Register a signal and a callback function for an I/O. Once the signal is triggered, the callback function reads the data. For example, registering a "readable" signal for a socket means that when data arrives and is readable, the signal is triggered, executing the callback function to copy data from the kernel cache to user space.
Asynchronous I/O
In asynchronous I/O, after the operating system completes the copy of data from the kernel to user space, it notifies the user thread with a signal that it can proceed to the next operation, eliminating the need for the user thread to block while copying data.

Principles of select, epoll, kqueue#

select
Select manages multiple sockets; when select receives an I/O interrupt from the network card, it returns without knowing which socket fd corresponds to the interrupt. The user thread needs to traverse and determine.
epoll
When epoll receives an I/O interrupt, it checks which socket fd corresponds to the interrupt. Epoll establishes a red-black tree (a type of balanced binary tree) for efficient searching. The user registers interested socket events by inserting the socket fd into the red-black tree, using the interrupt number as the key, which can be understood as a tuple of (interrupt number, socket fd). Removing events means deleting a node from the tree.

When an I/O interrupt occurs, epoll copies the data from the network card to the kernel cache, checks the corresponding fd in the red-black tree based on the interrupt number, and adds the fd to the ready list, preparing to return it to the user thread. The user directly obtains the ready fd.

kqueue
When a socket I/O interrupt occurs, it looks up the corresponding socket fd in a hash table and then adds it to a linked list for return. When the user registers an interested event, it adds an fd to the hash table.