Address Translation

Compaction is not really feasible, except in isolated situations.  Instead, use hardware support for relocation and noncontiguous allocation, through address translation.  The basic model:
A simple example of how this model could be implemented is through the use of a relocation register and a bounds register.  In practice, two approaches are used:  segmentation and paging.


A program is comprised of segments (data, code, stack, etc.)

A logical address is a pair consisting of a segment number and a byte offset (displacement) within the segment.

The hardware architecture defines a segment table containing one entry per segment.  It is normally implemented as an array indexed by segment number.

Each segment table entry contains the base address and limit or length of the segment.

Using the table, address translation works like this:

Segments reflect the logical structure of the program.  The contents of the segment table defines an address space -- a mapping from a set of addresses to a set of memory cells.

How does this relate to:

Relocation.  It's not necessary to bind code addresses to physical addresses at load time.  This is done dynamically by the address translator; "base" acts as a relocation register.  However, the linker must still perform relocation when it combines segments from different object files.

Fragmentation.  This is still a problem, because segments are of varying length.  The situation may be a little better, because a program's segments need not be loaded into adjacent memory blocks.

Protection.  Segments can be used for protection by giving each process its own segment table.  A program cannot access addresses outside its address space.  If a program generates an erroneous address, it will trap to the operating system.

Sharing.  A segment can be shared among processes, by putting a reference to it in the segment table each process.  This technique is used in the implementation of shared libraries and DLLs.

Segmentation in the Intel architecture

A logical address consists of a 16-bit selector and a 32-bit offset.  The selector is an index into a segment table, which Intel calls a descriptor table..  For a program to access a segment, the segment selector must be contained in one of the six segment registers CS, DS, ES, FS, GS, and SS.  Each segment table entry is called a descriptor -- a 64-bit structure containing base address, limit, type (code, data), etc.

A program has access to a global descriptor table (system-wide) and a local descriptor table (per-process).  The GDTR (Global Descriptor Table Register) contains the base and limit of the global descriptor table.  The LDTR contains a selector which refers to an entry in the GDT which contains the base and limit of the LDT.

How would these structures be applied in a multiprogramming operating system?

Problems with segmentation

Address translation by paging is similar to segmentation.  A virtual address is comprised of a page number and an offset.  Page tables are used to translate the page number into a page frame number; that is, the physical location of the page in main memory.  The main difference between pages and segments is that whereas segments have varying length determined by the structure of the user program, pages have a uniform length determined by the hardware architecture.

example  Suppose we have a logical and physical address size of 24 bits, with a page size of 2K.  How many bits are there in a page number? in an offset?  How many entries are there in a page table?

In general, if a logical address is n bits and the page size is 2m , we have
Paging greatly simplifies the management of physical memory by the operating system.  Memory can be viewed as an array of pages.  It can be managed using a frame table; each frame is either free, or allocated to some address space.  Paging eliminates the external fragmentation problem, but is still subject to internal fragmentation.

What about relocation, protection, sharing?

Both of these address translation techniques are examples of direct mapping , so named because the translation is based on direct indexing into the segment or page table.

Combining segmentation and paging

Some architectures combine segmentation and paging to try to achieve the benefits of both.

On the IBM 370 mainframe architecture, the logical address space is divided into segments.  Then each segment is divided into pages.  The resulting virtual address is comprised of a segment number, a page number (within the segment), and an offset (within the page).

On the Intel architecture, the logical address space is divided into segments.  A virtual address consists of a 16-bit selector and a 32-bit offset.  Segment tables (called descriptor tables) are used to translate a logical address into a linear address (in contrast to the two-dimensional segmented virtual address).  If paging is disabled, a linear address is the same as a physical address.  If paging is enabled, linear addresses are translated into physical addresses using page tables.  A two-level paging scheme is used.  The top-level tables are called page directories; each linear address space has one page directory.  Each page directory contains the address of one of the second-level tables, which are called page tables.  Each page table entry contains a physical page frame number.  A linear address has three parts:  a page directory index (10 bits), a page table index (10 bits), and a page offset (12 bits).

64-bit addressing

First step:  PAE (Physical Address Extension).  Processes still use 32-bit addresses (limiting a process's virtual address space to 4GB), but physical addresses are 36 bits, allowing a CPU to address up to 64GB of physical memory.  The linear address is modified to contain indexes into
Second step:  long mode.  Processes may use up to 64-bit addresses (although no existing processors go that high).  Four levels of page table are used.  The linear address is divided into 4 table indexes and an offset.