13.3. INSTRUCTION FORMATS
An
instruction format defines the layout of the bits of an instruction, in
terms ofits constituent fields. An instruction format must include an
opcode and, implicitly or explicitly, zero or more operands. Each
explicit operand is referenced using one of the addressing modes
described in Section 13.1. The format must, implicitly or explicitly,
indicate the addressing mode for each operand. For most instruction
sets,
more than one instruction format is used.
The
design of an instruction format is a complex art, and an amazing
variety of designs have been implemented. We examine the key design
issues, looking briefly at some designs to illustrate points, and then
we examine the x86 and ARM solutions in detail.
13.3.A. Instruction Length
An
instruction format defines the layout of the bits of an instruction, in
terms of its constituent fields. An instruction format must include an
opcode and, implicitly or explicitly, zero or more operands. Each
explicit operand is referenced using one of the addressing modes
described in Section 13.1. The format must, implicitly or explicitly,
indicate the addressing mode for each operand. For most instruction
sets,
more than one instruction format is used.
The
design of an instruction format is a complex art, and an amazing
variety of designs have been implemented. We examine the key design
issues, looking briefly at some designs to illustrate points, and then
we examine the x86 and ARM solutions in detail.
The
most obvious trade-off here is between the desire for a powerful
instruction repertoire and a need to save space. Programmers want more
opcodes, more operands, more addressing modes, and greater address
range. More opcodes and more operands make life easier for the
programmer, because shorter programs can be written to accomplish given
tasks. Similarly, more addressing modes give the programmer greater
flexibility in implementing certain functions, such as table
manipulations and multiple-way branching. And, of course, with the
increase in main memory size and the increasing use of virtual memory,
programmers want to be able to address larger memory ranges. All of
these things (opcodes, operands, addressing modes, address range)
require bits and push in the direction of longer instruction lengths.
But longer instruction length may be wasteful. A 64-bit instruction
occupies twice the space of a 32-bit instruction but is probably less
than twice as useful.
Beyond
this basic trade-off, there are other considerations. Either the
instruction length should be equal to the memory-transfer length (in a
bus system, databus length) or one should be a multiple of the other.
Otherwise, we will not getan integral number of instructions during a
fetch cycle. A related consideration is the memory transfer rate. This
rate has not kept up with increases in processor speed. Accordingly,
memory can become a bottleneck if the processor can execute instructions
faster than it can fetch them. One solution to this problem is to use
cache memory (see Section 4.3); another is to use shorter instructions.
Thus, 16-bit instructions can be fetched at twice the rate of 32-bit
instructions but probably can be executed less than twice as rapidly.
A
seemingly mundane but nevertheless important feature is that the
instruction length should be a multiple of the character length, which
is usually 8 bits, and of the length of fixed-point numbers. To see
this, we need to make use of that unfortunately ill-defined word, word
[FRAI83]. The word length of memory is, in some sense, the “natural”
unit of organization. The size of a word usually determines the
size
of fixed-point numbers (usually the two are equal). Word size is also
typically equal to, or at least integrally related to, the memory
transfer size. Because a common form of data is character data, we would
like a word to store an integral number of characters. Otherwise, there
are wasted bits in each word when storing multiple characters, or a
character will have to straddle a word boundary.
The
importance of this point is such that IBM, when it introduced the
System/360 and wanted to employ 8-bit characters, made the wrenching
decision to move from the 36-bit architecture of the scientific members
of the 700/7000 series to a 32-bit architecture.
13.3.B Allocation of Bits
We’ve looked at some of the factors that go into deciding the length of the instruction format. An equally difficult issue is how to allocate the bits in that format. The trade-offs here are complex.
For a given instruction length, there is clearly a trade-off between the number of opcodes and the power of the addressing capability. More opcodes obviously mean more bits in the opcode field. For an instruction format of a given length, this reduces the number of bits available for addressing. There is one interesting refinement to this trade-off, and that is the use of variable-length opcodes. In this
approach, there is a minimum opcode length but, for some opcodes, additional operations may be specified by using additional bits in the instruction. For a fixedlength instruction, this leaves fewer bits for addressing. Thus, this feature is used for those instructions that require fewer operands and/or less powerful addressing.
The following interrelated factors go into determining the use of the addressing bits.
• Number of addressing modes: Sometimes an addressing mode can be indicated implicitly. For example, certain opcodes might always call for indexing. In other cases, the addressing modes must be explicit, and one or more mode bits will be needed.
• Number of operands: We have seen that fewer addresses can make for longer,more awkward programs (e.g., Figure 10.3). Typical instruction formats on today’s machines include two operands. Each operand address in the instruction might require its own mode indicator, or the use of a mode indicator could be limited to just one of the address fields.
• Register versus memory: A machine must have registers so that data can be brought into the processor for processing. With a single user-visible register (usually called the accumulator), one operand address is implicit and consumes no instruction bits. However, single-register programming is awkwardand requires many instructions. Even with multiple registers, only a few bits are needed to specify the register. The more that registers can be used for operand references, the fewer bits are needed. A number of studies indicate that a total of 8 to 32 user-visible registers is desirable [LUND77, HUCK83].
Most contemporary architectures have at least 32 registers.
• Number of register sets: Most contemporary machines have one set of general purpose registers, with typically 32 or more registers in the set. These registers can be used to store data and can be used to store addresses for displacement addressing. Some architectures, including that of the x86, have a collection of two or more specialized sets (such as data and displacement). One advantage of this latter approach is that, for a fixed number of registers, a functional split requires fewer bits to be used in the instruction. For example, with two sets of eight registers, only 3 bits are required to identify a register; the opcode or mode register will determine which set of registers is being referenced.
• Address range: For addresses that reference memory, the range of addresses that can be referenced is related to the number of address bits. Because this imposes a severe limitation, direct addressing is rarely used. With displacement addressing, the range is opened up to the length of the address register.
Even so, it is still convenient to allow rather large displacements from the register address, which requires a relatively large number of address bits in the instruction.
• Address granularity: For addresses that reference memory rather than registers, another factor is the granularity of addressing. In a system with 16- or 32-bit words, an address can reference a word or a byte at the designer’s choice. Byte addressing is convenient for character manipulation but requires, for a fixed-size memory, more address bits.
This, the designer is faced with a host of factors to consider and balance. How critical the various choices are is not clear. As an example, we cite one study [CRAG79] that compared various instruction format approaches, including the use of a stack, general-purpose registers, an accumulator, and only memory-to-register approaches. Using a consistent set of assumptions, no significant difference in code
space or execution time was observed.
Let us briefly look at how two historical machine designs balance these various factors.
PDP-8
One of the simplest instruction designs for a general-purpose computer was for the PDP-8 [BELL78b]. The PDP-8 uses 12-bit instructions and operates on 12-bit words. There is a single general-purpose register, the accumulator.
Despite the limitations of this design, the addressing is quite flexible. Each memory reference consists of 7 bits plus two 1-bit modifiers. The memory is divided into fixed-length pages of 27 = 128 words each. Address calculation is based on references to page 0 or the current page (page containing this instruction) as determined by the page bit. The second modifier bit indicates whether direct or indirect
addressing is to be used. These two modes can be used in combination, so that an indirect address is a 12-bit address contained in a word of page 0 or the current page. In addition, 8 dedicated words on page 0 are auto index “registers.” When anindirect reference is made to one of these locations, preindexing occurs.
Figure 13.5 shows the PDP-8 instruction format. There are a 3-bit opcode and three types of instructions. For opcodes 0 through 5, the format is a single-address memory reference instruction including a page bit and an indirect bit. Thus, there are only six basic operations. To enlarge the group of operations, opcode 7 defines
a register reference or microinstruction. In this format, the remaining bits are used to encode additional operations. In general, each bit defines a specific operation (e.g., clear accumulator), and these bits can be combined in a single instruction. The microinstruction strategy was used as far back as the PDP-1 by DEC and is, in a sense, a forerunner of today’s microprogrammed machines, to be discussed in Part Four. Opcode 6 is the I/O operation; 6 bits are used to select one of 64 devices, and 3 bits specify a particular I/O command.
The PDP-8 instruction format is remarkably efficient. It supports indirect addressing, displacement addressing, and indexing. With the use of the opcode extension, it supports a total of approximately 35 instructions. Given the constraints of a 12-bit instruction length, the designers could hardly have done better.
PDP-10
A sharp contrast to the instruction set of the PDP-8 is that of the PDP-10. The PDP-10 was designed to be a large-scale time-shared system, with an emphasis on making the system easy to program, even if additional hardware expense was involved.
Among the design principles employed in designing the instruction set were the following [BELL78c]:
•
Orthogonality: Orthogonality is a principle by which two variables are independent of each other. In the context of an instruction set, the term indicates that other elements of an instruction are independent of (not determined by) the opcode. The PDP-10 designers use the term to describe the fact that an
address is always computed in the same way, independent of the opcode. This is in contrast to many machines, where the address mode sometimes depends implicitly on the operator being used.
•
Completeness: Each arithmetic data type (integer, fixed-point, floating-point) should have a complete and identical set of operations.
•
Direct addressing: Base plus displacement addressing, which places a memory organization burden on the programmer, was avoided in favor of direct
addressing.
Each of these principles advances the main goal of ease of programming.
The PDP-10 has a 36-bit word length and a 36-bit instruction length. The fixed instruction format is shown in Figure 13.6. The opcode occupies 9 bits, allowing up to 512 operations. In fact, a total of 365 different instructions are defined. Most instructions have two addresses, one of which is one of 16 general-purpose registers. This, this operand reference occupies 4 bits. The other operand reference starts with an 18-bit memory address field. This can be used as an immediate operand or a memory address. In the latter usage, both indexing and indirect addressing are allowed. The same general-purpose registers are also used as index registers.
A 36-bit instruction length is true luxury. There is no need to do clever things to get more opcodes; a 9-bit opcode field is more than adequate. Addressing is also straightforward. An 18-bit address field makes direct addressing desirable. For memory sizes greater than 218, indirection is provided. For the ease of the programmer, indexing is provided for table manipulation and iterative programs. Also, with
an 18-bit operand field, immediate addressing becomes attractive.
The PDP-10 instruction set design does accomplish the objectives listed earlier [LUND77]. It eases the task of the programmer or compiler at the expense of an inefficient utilization of space. This was a conscious choice made by the designers and therefore cannot be faulted as poor design.
13.3.C. Variable-Length Instructions
The examples we have looked at so far have used a single fixed instruction length, and we have implicitly discussed trade-offs in that context. But the designer may choose instead to provide a variety of instruction formats of different lengths. This tactic makes it easy to provide a large repertoire of opcodes, with different opcode lengths. Addressing can be more flexible, with various combinations of register and memory references plus addressing modes. With variable-length instructions, these many variations can be provided efficiently and compactly.
The principal price to pay for variable-length instructions is an increase in the complexity of the processor. Falling hardware prices, the use of microprogramming (discussed in Part Four), and a general increase in understanding the principles of processor design have all contributed to making this a small price to pay. However we will see that RISC and superscalar machines can exploit the use of fixed-length
instructions to provide improved performance.
The use of variable-length instructions does not remove the desirability of making all of the instruction lengths integrally related to the word length. Because the processor does not know the length of the next instruction to be fetched, a typical strategy is to fetch a number of bytes or words equal to at least the longest possible instruction. This means that sometimes multiple instructions are fetched. However, as we shall see in Chapter 14, this is a good strategy to follow in any case.
PDP-11
The PDP-11 was designed to provide a powerful and flexible instruction set within the constraints of a 16-bit minicomputer [BELL70].
The PDP-11 employs a set of eight 16-bit general-purpose registers. Two of these registers have additional significance: one is used as a stack pointer for special-purpose stack operations, and one is used as the program counter, which contains the address of the next instruction.
Figure 13.7 shows the PDP-11 instruction formats. Thirteen different formats are used, encompassing zero-, one-, and two-address instruction types. The opcode can vary from 4 to 16 bits in length. Register references are 6 bits in length. Three bits identify the register, and the remaining 3 bits identify the addressing mode. The PDP-11 is endowed with a rich set of addressing modes. One advantage of linking
the addressing mode to the operand rather than the opcode, as is sometimes done, is that any addressing mode can be used with any opcode. As was mentioned, this independence is referred to as orthogonality
PDP-11 instructions are usually one word (16 bits) long. For some instructions, one or two memory addresses are appended, so that 32-bit and 48-bit instructions are part of the repertoire. This provides for further flexibility in addressing.
The PDP-11 instruction set and addressing capability are complex. This increases both hardware cost and programming complexity. The advantage is that more efficient or compact programs can be developed.
VAX
Most architectures provide a relatively small number of fixed instructionformats. This can cause two problems for the programmer. First, addressing mode and opcode are not orthogonal. For example, for a given operation, one operand must come from a register and another from memory, or both from registers, and so on. Second, only a limited number of operands can be accommodated: typically up
to two or three. Because some operations inherently require more operands, various strategies must be used to achieve the desired result using two or more instructions.
To avoid these problems, two criteria were used in designing the VAX instruction format [STRE78]:
1. All instructions should have the “natural” number of operands.
2. All operands should have the same generality in specification.
The result is a highly variable instruction format. An instruction consists of a 1- or 2-byte opcode followed by from zero to six operand specifiers, depending on the opcode. The minimal instruction length is 1 byte, and instructions up to 37 bytes can be constructed. Figure 13.8 gives a few examples.
The VAX instruction begins with a 1-byte opcode. This suffices to handle most VAX instructions. However, as there are over 300 different instructions, 8 bits are not enough. The hexadecimal codes FD and FF indicate an extended opcode, with the actual opcode being specified in the second byte.
The remainder of the instruction consists of up to six operand specifiers. An operand specifier is, at minimum, a 1-byte format in which the leftmost 4 bits are the address mode specifier. The only exception to this rule is the literal mode,which is signaled by the pattern 00 in the leftmost 2 bits, leaving space for a 6-bit literal. Because of this exception, a total of 12 different addressing modes can be specified.
An operand specifier often consists of just one byte, with the rightmost 4 bits specifying one of 16 general-purpose registers. The length of the operand specifier can be extended in one of two ways. First, a constant value of one or more bytes may immediately follow the first byte of the operand specifier. An
example of this is the displacement mode, in which an 8-, 16-, or 32-bit displacement is used. Second, an index mode of addressing may be used. In this case, the first byte of the operand specifier consists of the 4-bit addressing mode code of 0100 and a 4-bit index register identifier. The remainder of the operand specifier consists of the base address specifier, which may itself be one or more bytes in length.
The reader may be wondering, as the author did, what kind of instruction requires six operands. Surprisingly, the VAX has a number of such instructions. Consider ADDP6 OP1, OP2, OP3, OP4, OP5, OP6.
This instruction adds two packed decimal numbers. OP1 and OP2 specify the length
and starting address of one decimal string; OP3 and OP4 specify a second string.
These two strings are added and the result is stored in the decimal string whose
length and starting location are specified by OP5 and OP6.
The VAX instruction set provides for a wide variety of operations and addressing modes. This gives a programmer, such as a compiler writer, a very powerful and flexible tool for developing programs. In theory, this should lead to efficient machine-language compilations of high-level language programs and, in general, to effective and efficient use of processor resources. The penalty to be paid for these
benefits is the increased complexity of the processor compared with a processor with a simpler instruction set and format.
We return to these matters in Chapter 15, where we examine the case for verysimple instruction sets.