Thursday, November 22, 2007

Memory Management -- Part 3


Memory Management

Let’s analyze, one step at a time, why the two entries are different. The page tables themselves need to be mapped onto some linear address. When Windows NT needs to access the page tables, it uses this linear address range. To represent 4GB of memory divided into 1MB pages of 4K each, we need 1K page tables each having 1K entries. To map these 1K page tables, Windows NT reserves 4MB of linear address space in each process. As we saw earlier, each process has a different set of page tables. Whatever the process, Windows NT maps the page tables on the linear address range from 0xC0000000 to 0xC03FFFFF. Let’s call this linear address range as the page table address range. In other words, the page table address range maps to different page tables–that is, to different physical pages–for different processes. As you may have noticed, the page table addresses range falls in the kernel address space. Windows NT cannot map this crucial system data structure in the user address space and allow user-mode processes to play with the memory. Ultimately, the result is that two processes cannot share pages in the page table address range although the addresses lie in the kernel-mode address range.

Exactly one page table is required to map 4MB address space because each page table has 1K entries and each entry corresponds to a 4K page. Consequently, Windows NT cannot share the page table corresponding to the page table address range. This accounts for one of the two mysterious entries in the page table directory. However, the entry’s mystery does not end here–there is one more subtle twist to this story. The physical address specified in this entry matches the physical address of the page table directory. The obvious conclusion is that the page table directory acts also as the page table for the page table address range. This is possible because the formats of the page table directory entry and PTE are the same on 80386.

The processor carries out an interesting sequence of actions when the linear address within the page table address range is translated to a physical address. Let’s say that the CR3 register points to page X. As the first step in the address translation process, the processor treats the page X as the page table directory and finds out the page table for the given linear address. The page table happens to be page X again. The processor now treats page X as the required page table and finds out the physical address from it. A more interesting case occurs when the operating system is accessing the page table directory itself. In this case, the physical address also falls in page X!

Let’s now turn to the second mysterious entry. The 4MB area covered by this page directory entry is internally referred to as hyperspace. This area is used for mapping the physical pages belonging to other processes into virtual address space. For example, a function such as MmMapPageInHyperspace() uses the virtual addresses in this range. This area is also used during the early stages of process creation. For example, when a parent process such as PROGMAN.EXE spawns a child process such as NOTEPAD.EXE, PROGMAN.EXE has to create the address space for NOTEPAD.EXE. This is done as a part of the MmCreateProcessAddressSpace() function. For starting any process, an address space must be created for the process. Address space is nothing but page directory. Also, the upper-half entries of page directory are common for all processes except for the two entries that we have already discussed. These entries need to be created for the process being spawned. The MmCreateProcessAddressSpace() function allocates three pages of memory: the first page for the page directory, the second page for holding the hyperspace page table entries, and the third page for holding the working set information for the process being spawned.

Once these pages are allocated, the function maps the first physical page in the address space using the MmMapPageInHyperSpace() function. Note that the MmMapPageInHyperSpace() function runs in the context of PROGMAN.EXE. Now the function copies the page directory entries in the upper half of the page directory to the mapped hyperspace virtual address. In short, PROGMAN.EXE creates the page directory for the NOTEPAD.EXE.

Windows NT supports memory-mapped files. When two processes map the same file, they share the same set of physical pages. Hence, memory-mapped files can be used for sharing memory. In fact, Windows NT itself uses memory-mapped files to load DLLs and executables. If two processes map the same DLL, they automatically share the DLL pages. The memory-mapped files are implemented using the section object under Windows NT. A data structure called PROTOPTE is associated with each section object. This data structure is a variable-length structure based on the size of the section. This data structure contains a 4-byte entry for each page in the virtual address space mapped by the section object. Each 4-byte entry has the same structure as that of the PTE. When the page is not being used by any of the processes, the protopte entry is invalid and contains enough information to get the page back. In this case, the CPU PTE contains a fixed value that is 0xFFFFF480, which indicates that accessing this page will be considered a protopte fault.

Now comes the toughest of all questions: "How can Windows NT give away 4GB of memory to each process when there is far less physical RAM available on the board?" Windows NT, as well as all other operating systems that allow more address space than actual physical memory, uses a technique called virtual memory to achieve this. In the next section, we discuss virtual memory management in Windows NT.


VIRTUAL MEMORY MANAGEMENT

The basic idea behind virtual memory is very simple. For each process, the operating system maps few addresses to real physical memory because RAM is expensive and relatively rare. Remaining memory for each process is really maintained on secondary storage (usually a hard disk). That’s why it is called virtual memory. The addresses that are not mapped on physical RAM are marked as such. Whenever a process accesses such an address, the operating system brings the data into memory from secondary storage. If the operating system runs out of physical RAM, some data is thrown out to make space. We can always get back this data because a copy is maintained on secondary storage. The data to be thrown out is decided by the replacement policy. Windows NT uses First-In-First-Out (FIFO) replacement policy. According to this policy, the oldest data (that is, the data that was brought in the RAM first) is thrown out whenever there is a space crunch.

To implement virtual memory management, Windows NT needs to maintain a lot of data. First, it needs to maintain whether each address is mapped to physical RAM or the data is to be brought in from secondary storage when a request with the address comes. Maintaining this information for each byte itself takes a lot of space (actually, more space than the address space for which the information is to be maintained). So Windows NT breaks the address space into 4KB pages and maintains this information in page tables. As we saw earlier, a page table entry (PTE) consists of the address of the physical page (if the page is mapped to physical RAM) and attributes of the page. Since the processor heavily depends on PTEs for address translation, the structure of PTE is processor dependent.

If a page is not mapped onto physical RAM, Windows NT marks the page as invalid. Any access to this page causes a page fault, and the page fault handler can bring in the page from the secondary storage. To be more specific, when the page contains DLL code or executable module code, the page is brought in from the DLL or executable file. When the page contains data, it is brought in from the swap file. When the page represents a memory-mapped file area, it is brought in from the corresponding file. Windows NT needs to keep track of free physical RAM so that it can allocate space for a page brought in from secondary storage in case of a page fault. This information is maintained in a kernel data structure called the Page Frame Database (PFD). The PFD also maintains a FIFO list of in-memory pages so that it can decide on pages to throw out in case of a space crunch.

Before throwing out a page, Windows NT must ensure that the page is not dirty. Otherwise, it needs to write that page to secondary storage before throwing it out. If the page is not shared, the PFD contains the pointer to PTE so that if the operating system decides to throw out a particular page, it can then go back and mark the PTE as invalid. If the page is shared, the PFD contains a pointer to the corresponding PROTOPTE entry. In this case, the PFD also contains a reference count for the page. A page can be thrown out only if its reference count is 0. In general, the PFD maintains the status of every physical page.

The PFD is an array of 24-byte entries, one for each physical page. Hence, the size of this array is equal to the number of physical pages that are stored in a kernel variable, namely, MmNumberOfPhysicalPages. The pointer to this array is stored in a kernel variable, namely, MmpfnDatabase. A physical page can be in several states–for example, it can be in-use, free, free but dirty, and so on. A PFD entry is linked in a doubly linked list, depending on the state of the physical page represented by it. For example, the PFD entry representing a free page is linked in the free pages list. Figure 4-4 shows these lists linked through the PFD. The forward links are shown on the left side of the PFD, and the backward links are shown on the right side.

There are in all six kinds of lists. The heads of these lists are stored in following kernel variables:

MmStandbyPageListHead

MmModifiedNoWritePageListHead

MmModifiedPageListHead

MmFreePageListHead

MmBadPageListHead

MmZeroedPageListHead

All these list heads are actually structures of 16 bytes each. Here is the structure definition:

typedef struct PageListHead {

DWORD NumberOfPagesInList,

DWORD TypeOfList,

DWORD FirstPage,

DWORD LastPage

} PageListHead_t;

The FirstPage field can be used as an index into the PFD. The PFD entry contains a pointer to the next page. Using this, you can traverse any of the lists. Here is the structure definition for the PFD entry:

typedef struct PfdEntry {

DWORD NextPage,

void *PteEntry/*PpteEntry,

DWORD PrevPage,

DWORD PteReferenceCount,

void *OriginalPte,

DWORD Flags;

} PfdEntry_t;

Using this, you can easily write a program to dump the PFD. However, there is one problem: kernel variables, such as list heads, MmPfnDatabase, and MmNumberOfPhysicalPages, are not exported. Therefore, you have to deal with absolute addresses, which makes the program dependent on the Windows NT version and build type.


VIRTUAL ADDRESS DESCRIPTORS

Along with the free physical pages, Windows NT also needs to keep track of the virtual address space allocation for each process. Whenever a process allocates a memory block–for example, to load a DLL–Windows NT checks for a free block in the virtual address space, allocates virtual address space, and updates the virtual address map accordingly. The most obvious place to maintain this information is page tables. For each process, Windows NT maintains separate page tables. There are 1 million pages, and each page table entry is 4 bytes. Hence, full page tables for a single process would take 4MB of RAM! There is a solution to this: Page tables themselves can be swapped out. It is inefficient to swap in entire page tables when a process wants to allocate memory. Hence, Windows NT maintains a separate binary search tree containing the information about current virtual space allocation for each process. A node in this binary search tree is called a Virtual Address Descriptor (VAD). For each block of memory allocated to a process, Windows NT adds a VAD entry to the binary search tree. Each VAD entry contains the allocated address range–that is, the start address and the end address of the allocated block, pointers to left and right children VADs, and a pointer to the parent VAD. The process environment block (PEB) contains a pointer, namely, VadRoot, to the root of this tree.

Listing 4-5: VADDUMP.C

/* Should be compiled in release mode */

#define _X86_




#include

#include

#include




#include "undocnt.h"

#include "gate.h"




/*Define the WIN32 calls we are using, since we can not include both

NTDDK.H and

WINDOWS.H in the same ’C’ file.*/

typedef struct _OSVERSIONINFO{

ULONG dwOSVersionInfoSize;

ULONG dwMajorVersion;

ULONG dwMinorVersion;

ULONG dwBuildNumber;

ULONG dwPlatformId;

CCHAR szCSDVersion[ 128 ];

} OSVERSIONINFO, *LPOSVERSIONINFO;

BOOLEAN _stdcall GetVersionExA(LPOSVERSIONINFO);

PVOID _stdcall VirtualAlloc(PVOID, ULONG, ULONG, ULONG);




/* Max vad entries */

#define MAX_VAD_ENTRIES 0x200




/* Following variables are accessed in RING0.ASM */

ULONG NtVersion;

ULONG PebOffset;

ULONG VadRootOffset;




#pragma pack(1)

typedef struct VadInfo {

void *VadLocation;

VAD Vad;

} VADINFO, *PVADINFO;

#pragma pack()




VADINFO VadInfoArray[MAX_VAD_ENTRIES];

int VadInfoArrayIndex;

PVAD VadTreeRoot;

The initial portion of the VADDUMP.C file has a few definitions apart from the header inclusion. In this program, we use the callgate mechanism as we did in the showdir program–hence the inclusion of the GATE.H header file. After the header inclusion, the file defines the maximum number of VAD entries that we’ll process. There is no limit on the nodes in a VAD tree. We use the callgate mechanism for kernel-mode execution of a function that dumps the VAD tree in an array accessible from the user mode. This array can hold up to MAX_VAD_ENTRIES entries. Each entry in the array is of type VADINFO. The VADINFO structure has two members: the address of the VAD tree node and the actual VAD tree node. The VAD tree node structure is defined in the UNDOCNT.H file as follows:

typedef struct vad {

void *StartingAddress;

void *EndingAddress;

struct vad *ParentLink;

struct vad *LeftLink;

struct vad *RightLink;

DWORD Flags;

}VAD, *PVAD;

No comments:

Google