Saturday, November 17, 2007

Memory Management

Memory Management

Author: Prasad Dabak
Milind Borate
Sandeep Phadke
Published: October 1999
Copyright: 1999
Publisher: M&T Books

Abstract
This chapter examines memory models in Microsoft operating systems, examines how Windows NT uses features of the 80386 processor's architecture, and explores the function of virtual memory.
MEMORY MANAGEMENT HAS ALWAYS been one of the most important and interesting aspects of any operating system for serious developers. It is an aspect that kernel developers ignore. Memory management, in essence, provides a thumbnail impression of any operating system.

Microsoft has introduced major changes in the memory management of each new operating system they have produced. Microsoft had to make these changes because they developed all of their operating systems for Intel microprocessors, and Intel introduced major changes in memory management support with each new microprocessor they introduced. This chapter is a journey through the various Intel microprocessors and the memory management changes each one brought along with it in the operating system that used it.


MEMORY MODELS IN MICROSOFT OPERATING SYSTEMS

Early PCs based on Intel 8086/8088 microprocessors could access only 640K of RAM and used the segmented memory model. Consequently, good old DOS allows only 640K of RAM and restricts the programmer to the segmented memory model.

In the segmented model, the address space is divided into segments. Proponents of the segmented model claim that it matches the programmer’s view of memory. They claim that a programmer views memory as different segments containing code, data, stack, and heap. Intel 8086 supports very primitive segmentation. A segment, in the 8086 memory model, has a predefined base address. The length of each segment is also fixed and is equal to 64K. Some programs find a single segment insufficient. Hence, there are a number of memory models under DOS. For example, the tiny model that supports a single segment for code, data, and stack together, or the small model that allows two segments–one for code and the other for data plus stack, and so on. This example shows how the memory management provided by an operating system directly affects the programming environment.

The Intel 80286 (which followed the Intel 8086) could support more than 640K of RAM. Hence, programmers got new interface standards for accessing extended and expanded memory from DOS. Microsoft’s second-generation operating system, Windows 3.1, could run on 80286 in standard mode and used the segmented model of 80286. The 80286 provided better segmentation than the 8086. In 80286’s model, segments can have a programmable base address and size limit. Windows 3.1 had another mode of operation, the enhanced mode, which required the Intel 80386 processor. In the enhanced mode, Windows 3.1 used the paging mechanisms of 80386 to provide additional performance. The virtual 8086 mode was also used to implement multiple DOS boxes on which DOS programs could run.

Windows 3.1 does not make full use of the 80386’s capabilities. Windows 3.1 is a 16-bit operating system, meaning that 16-bit addresses are used to access the memory and the default data size is also 16 bits. To make full use of 80386’s capabilities, a 32-bit operating system is necessary. Microsoft came up with a 32-bit operating system, Windows NT. The rest of this chapter examines the details of Windows NT memory management. Microsoft also developed Windows 95 after Windows NT. Since both these operating systems run on 80386 and compatibles, their memory management schemes have a lot in common. However, you can best appreciate the differences between Windows NT and Windows 95/98 after we review Windows NT memory management. Therefore, we defer this discussion until a later section of this chapter.


WINDOWS NT MEMORY MANAGEMENT OVERVIEW

We’ll first cover the view Windows NT memory management presents to the outside world. In the next section, we explain the special features provided by Intel microprocessors to implement memory management. Finally, we discuss how Windows NT uses these features to implement the interface provided to the outside world.

Memory Management Interface—Programmer’s View
Windows NT offers programmers a 32-bit flat address space. The memory is not segmented; rather, it is 4GB of continuous address space. (Windows NT marked the end of segmented architecture–programmers clearly preferred flat models to segmented ones.) Possibly, with languages such as COBOL where you need to declare data and code separately, programmers view memory as segments. However, with new languages such as C and C++, data variables and code can be freely mixed and the segmented memory model is no longer attractive. Whatever the reason, Microsoft decided to do away with the segmented memory model with Windows NT. The programmer need not worry whether the code/data fits in 64K segments. With the segmented memory model becoming extinct, the programmer can breathe freely. At last, there is a single memory model, the 32-bit flat address space.

Windows NT is a protected operating system; that is, the behavior (or misbehavior) of one process should not affect another process. This requires that no two processes are able to see each other’s address space. Thus, Windows NT should provide each process with a separate address space. Out of this 4GB address space available to each process, Windows NT reserves the upper 2GB as kernel address space and the lower 2GB as user address space, which holds the user-mode code and data. The entire address space is not separate for each process. The kernel code and kernel data space (the upper 2GB) is common for all processes; that is, the kernel-mode address space is shared by all processes. The kernel-mode address space is protected from being accessed by user-mode code. The system DLLs (for example, KERNEL32.DLL, USER32.DLL, and so on) and other DLLs are mapped in user-mode space. It is inefficient to have a separate copy of a DLL for each process. Hence, all processes using the DLL or executable module share the DLL code and incidentally the executable module code. Such a shared code region is protected from being modified because a process modifying shared code can adversely affect other processes using the code.

Sharing of the kernel address space and the DLL code can be called implicit sharing. Sometimes two processes need to share data explicitly. Windows NT enables explicit sharing of address space through memory-mapped files. A developer can map a named file onto some address space, and further accesses to this memory area are transparently directed to the underlying file. If two or more processes want to share some data, they can map the same file in their respective address spaces. To simply share memory between processes, no file needs to be created on the hard disk.


BELOW THE OPERATING SYSTEM

In her book Inside Windows NT, Helen Custer discusses memory management in the context of the MIPS processor. Considering that a large number of the readers would be interested in a similar discussion that focuses on Intel processors, we discuss the topic in the context of the Intel 80386 processor (whose memory management architecture is mimicked by the later 80486 and Pentium series). If you are already conversant with the memory management features of the 80386 processor, you may skip this section entirely.

We now examine the 80386’s addressing capabilities and the fit that Windows NT memory management provides for it. Intel 80386 is a 32-bit processor; this implies that the address bus is 32-bit wide, and the default data size is as well. Hence, 4GB (232 bytes) of physical RAM can be addressed by the microprocessor. The microprocessor supports segmentation as well as paging. To access a memory location, you need to specify a 16-bit segment selector and a 32-bit offset within the segment. The segmentation scheme is more advanced than that in 8086. The 8086 segments start at a fixed location and are always 64K in size. With 80386, you can specify the starting location and the segment size separately for each segment.

Segments may overlap–that is, two segments can share address space. The necessary information (the starting offset, size, and so forth) is conveyed to the processor via segment tables. A segment selector is an index into the segment table. At any time, only two segment tables can be active: a Global Descriptor Table (GDT) and a Local Descriptor Table (GDT). A bit in the selector indicates whether the processor should refer to the LDT or the GDT. Two special registers, GDTR and LDTR, point to the GDT and the LDT, respectively. The instructions to load these registers are privileged, which means that only the operating system code can execute them.

A segment table is an array of segment descriptors. A segment descriptor specifies the starting address and the size of the segment. You can also specify some access permission bits with a segment descriptor. These bits specify whether a particular segment is read-only, read-write, executable, and so on. Each segment descriptor has 2 bits specifying its privilege level, called as the descriptor privilege level (DPL).

The processor compares the DPL with the Requested Privilege Level (RPL) before granting access to a segment. The RPL is dictated by 2 bits in the segment selector while specifying the address. The Current Privilege Level (CPL) also plays an important role here. The CPL is the DPL of the code selector being executed. The processor grants access to a particular segment only if the DPL of the segment is less than or equal to the RPL as well as the CPL. This serves as a protection mechanism for the operating system. The CPL of the processor can vary between 0 and 3 (because 2 bits are assigned for CPL). The operating system code generally runs at CPL=0, also called as ring 0, while the user processes run at ring 3. In addition, all the segments belonging to the operating system are allotted DPL=0. This arrangement ensures that the user mode cannot access the operating system memory segments.

It is very damaging to performance to consult the segment tables, which are stored in main memory, for every memory access. Caching the segment descriptor in special CPU registers, namely, CS (Code Selector), DS (Data Selector), SS (Stack Selector), and two general-purpose selectors called ES and FS, solves this problem. The first three selector registers in this list–that is, CS, DS, and SS–act as default registers for code access, data access, and stack access, respectively.

To access a memory location, you specify the segment and offset within that segment. The first step in address translation is to add the base address of the segment to the offset. This 32-bit address is the physical memory address if paging is not enabled. Otherwise this address is called as the logical or linear address and is converted to a physical RAM address using the page address translation mechanism (refer to Figure 4-1).

The memory management scheme is popularly known as paging because the memory is divided into fixed-size regions called pages. On Intel processors (80386 and higher), the size of one page is 4 kilobytes. The 32-bit address bus can access up to 4GB of RAM. Hence, there are one million (4GB/4K) pages.

Page address translation is a logical to physical address mapping. Some bits in the logical/linear address are used as an index in the page table, which provides a logical to physical mapping for pages. The page translation mechanism on Intel platforms has two levels, with a structure called page table directory at the second level. As the name suggests, a page table directory is an array of pointers to page tables. Some bits in the linear address are used as an index in the page table directory to get the appropriate page table to be used for address translation.

The page address translation mechanism in the 80386 requires two important data structures to be maintained by the operating system, namely, the page table directory and the page tables. A special register, CR3, points to the current page table directory. This register is also called Page Directory Base Register (PDBR). A page table directory is a 4096-byte page with 1024 entries of 4 bytes each. Each entry in the page table directory points to a page table. A page table is a 4096-byte page with 1024 entries of 4 bytes (32 bits) each. Each Page Table Entry (PTE) points to a physical page. Since there are 1 million pages to be addressed, out of the 32 bits in a PTE, 20 bits act as upper 20 bits of physical address. The remaining 12 bits are used to maintain attributes of the page.

Some of these attributes are access permissions. For example, you can denote a page as read-write or read-only. A page also has an associated security bit called as the supervisor bit, which specifies whether a page can be accessed from the user-mode code or only from the kernel-mode code. A page can be accessed only at ring 0 if this bit is set. Two other bits, namely, the accessed bit and the dirty bit, indicate the status of the page. The processor sets the accessed bit whenever the page is accessed. The processor sets the dirty bit whenever the page is written to. Some bits are available for operating system use. For example, Windows NT uses one such bit for implementing the copy-on-write protection. You can also mark a page as invalid and need not specify the physical page address. Accessing such a page generates a page fault exception. An exception is similar to a software interrupt. The operating system can install an exception handler and service the page faults. You’ll read more about this in the following sections.

32-bit memory addresses break down as follows. The upper 10 bits of the linear address are used as the page directory index, and a pointer to the corresponding page table is obtained. The next 10 bits from the linear address are used as an index in this page table to get the base address of the required physical page. The remaining 12 bits are used as offset within the page and are added to the page base address to get the physical address.


THE INSIDE LOOK

In this section, we examine how Windows NT has selectively utilized existing features of the 80386 processor’s architecture to achieve its goals.

Flat Address Space
First, let’s see how Windows NT provides 32-bit flat address space to the processes. As we know from the previous section, Intel 80386 offers segmentation as well as paging. So how does Windows NT provide a flat memory instead of a segmented one? Turn off segmentation? You cannot turn off segmentation on 80386. However, the 80386 processor enables the operating system to load the segment register once and then specify only 32-bit offsets for subsequent instructions. This is exactly what Windows NT does. Windows NT initializes all the segment registers to point to memory locations from 0 to 4GB, that is, the base is set as 0 and the limit is set as 4GB. The CS, SS, DS, and ES are initialized with separate segment descriptors all pointing to locations from 0 to 4GB. So now the applications can use only 32-bit offset, and hence see a 32-bit flat address space. A 32-bit application running under Windows NT is not supposed to change any of its segment registers.

No comments:

Google