Difference between revisions of "Code parsing"

From gem5
Jump to: navigation, search
(Split out from original monolithic page)
 
(Instruction operands)
Line 37: Line 37:
  
 
Most of the automation provided by the parser is based on its recognition of the operands used in the instruction definition code. Most relevant instruction characteristics can be inferred from the operands: floating-point vs. integer instructions can be recognized by the registers used, an instruction that reads from a memory location is a load, etc. In combination with the bitfield operands and type qualifiers described above, most instructions can be described in a single line of code. In addition, most of the differences between simulator CPU models lies in the operand access mechanisms; by generating the code for these accesses automatically, a single description suffices for a variety of situations.
 
Most of the automation provided by the parser is based on its recognition of the operands used in the instruction definition code. Most relevant instruction characteristics can be inferred from the operands: floating-point vs. integer instructions can be recognized by the registers used, an instruction that reads from a memory location is a load, etc. In combination with the bitfield operands and type qualifiers described above, most instructions can be described in a single line of code. In addition, most of the differences between simulator CPU models lies in the operand access mechanisms; by generating the code for these accesses automatically, a single description suffices for a variety of situations.
The ISA description provides a list of recognized instruction operands and their characteristics via the <code>def operands</code> statement. This statement specifies a Python dictionary that maps operand strings to operand traits objects based on classes provided by the parser. The parser supports five classes of operands: integer registers, floating-point registers, memory locations, the next program counter (NPC), and control registers. The constructor for each class takes four arguments:
 
  
# the default type of the operand (an extension string from operandTypeMap),
+
The ISA description provides a list of recognized instruction operands and their characteristics via the <code>def operands</code> statement. This statement specifies a Python dictionary that maps operand strings to a five-element tuple.  The elements of the tuple specify the operand as follows:
 +
 
 +
# the operand class, which must be one of the strings "IntReg", "FloatReg", "Mem", "NPC", or "ControlReg", indicating an integer register, floating-point register, memory location, the next program counter (NPC), or a control register, respectively.
 +
# the default type of the operand (an extension string defined in the <code>def operand_types</code> block),
 
# a specifier indicating how specific instances of the operand are decoded (e.g., a bitfield name),
 
# a specifier indicating how specific instances of the operand are decoded (e.g., a bitfield name),
# a structure indicating the instruction flags that can be inferred when the operand is used, and
+
# a string or triple of strings indicating the instruction flags that can be inferred when the operand is used, and
# a sort priority used to make operand list order deterministic.
+
# a sort priority used to control the order of operands in disassembly.
  
For example, a subset of the Alpha ISA operand traits map is as follows:
+
For example, a simplified subset of the Alpha ISA operand traits map is as follows:
  
 
<pre>
 
<pre>
 
def operands {{
 
def operands {{
     'Ra': IntRegOperandTraits('uq', 'RA', 'IsInteger', 1),
+
     'Ra': ('IntReg', 'uq', 'RA', 'IsInteger', 1),
     'Rb': IntRegOperandTraits('uq', 'RB', 'IsInteger', 2),
+
     'Rb': ('IntReg', 'uq', 'RB', 'IsInteger', 2),
     'Rc': IntRegOperandTraits('uq', 'RC', 'IsInteger', 3),
+
     'Rc': ('IntReg', 'uq', 'RC', 'IsInteger', 3),
     'Fa': FloatRegOperandTraits('df', 'FA', 'IsFloating', 1),
+
     'Fa': ('FloatReg', 'df', 'FA', 'IsFloating', 1),
     'Fb': FloatRegOperandTraits('df', 'FB', 'IsFloating', 2),
+
     'Fb': ('FloatReg', 'df', 'FB', 'IsFloating', 2),
     'Fc': FloatRegOperandTraits('df', 'FC', 'IsFloating', 3),
+
     'Fc': ('FloatReg', 'df', 'FC', 'IsFloating', 3),
     'Mem': MemOperandTraits('uq', None,
+
     'Mem': ('Mem', 'uq', None, ('IsMemRef', 'IsLoad', 'IsStore'), 4),
                            ('IsMemRef', 'IsLoad', 'IsStore'), 4),
+
     'NPC': ('NPC', 'uq', None, ( one, None, 'IsControl'), 4)
     'NPC': NPCOperandTraits('uq', None, ( None, None, 'IsControl' ), 4),
 
 
}};
 
}};
 
</pre>
 
</pre>
  
The operand named <code>Ra</code> is an integer register, default type unsigned quadword, uses the <code>RA</code> bitfield from the instruction, implies no flags, and has a sort priority of 1 (placing it first in any list of operands). A single flag argument implies an unconditionally inferred instruction flag. Thus any instruction using a floating-point register operand can infer the <code>IsFloating</code> flag. If the flag operand is a triple, the first element is unconditional, the second is inferred when the operand is a source, and the third when it is a destination. Thus any description with a memory operand is marked as a memory reference. If the operand is a source, it's a load, while if it's a destination, it's a store. Also, any instruction that writes to the NPC is a control instruction.
+
The operand named <code>Ra</code> is an integer register, default type <code>uq</code> (unsigned quadword), uses the <code>RA</code> bitfield from the instruction, implies the <code>IsInteger</code> instruction flag, and has a sort priority of 1 (placing it first in any list of operands).
 +
 
 +
For the instrucion flag element, a single string (such as <code>'IsInteger'</code> implies an unconditionally inferred instruction flag. If the flag operand is a triple, the first element is unconditional, the second is inferred when the operand is a source, and the third when it is a destination. Thus the <code>('IsMemRef', 'IsLoad', 'IsStore')</code> element for memory references indicates that any instruction with a memory operand is marked as a memory reference. In addition, if the memory operand is a source, the instruction is marked as a load, while if the operand is a destination, the instruction is marked a store. Similarly, the <code>(None, None, 'IsControl')</code> tuple for the NPC operand indicates that any instruction that writes to the NPC is a control instruction, but instructions which merely reference NPC as a source do not receive any default flags.
 +
 
 +
Note that description code parsing uses regular expressions, which limits the ability of the parser to infer the nature of a partciular operand.  In particular, destination operands are distinguished from source operands solely by testing whether the operand appears on the left-hand side of an assignment operator (<code>=</code>). Destination operands that are assigned to in a different fashion, e.g. by being passed by reference to other functions, must still appear on the left-hand side of an assignment to be properly recognized as destinations.  The parser also does not recognize C compound assignments, e.g., <code>+=</code>.  If an operand is both a source and a destination, it must appear on both the left- and right-hand sides of <code>=</code>.
  
Because description code parsing uses regular expressions, destination operands are distinguished solely by testing the code after the operand for an assignment operator (=). Destination operands that are assigned to in a different fashion, e.g. by being passed by reference to other functions, must still appear on the left-hand side of an assignment to be properly recognized.
+
Another limitation of regular-expression-based code parsing is that control flow in the code block is not recognized.  Combined with the details of how register updates are performed in the CPU models, this means that destinations cannot be updated conditionally. If a particular register is recognized as a destination register, that register will always be updated at the end of the <code>execute()</code> method, and thus the code must assign a valid value to that register along each possible code path within the block.
  
 
===The CodeBlock class===
 
===The CodeBlock class===

Revision as of 18:20, 22 June 2006

To a large extent, the power and flexibility of the ISA description mechanism stem from the fact that the mapping from a brief instruction definition provided in the decode block to the resulting C++ code is performed in a general-purpose programming language (Python). (This function is performed by the "instruction format" definition described above in Format definitions.) Technically, the ISA description language allows any arbitrary Python code to perform this mapping. However, the parser provides a library of Python classes and functions designed to automate the process of deducing an instruction's characteristics from a brief description of its operation, and generating the strings required to populate declaration and decode templates. This library represents roughly half of the code in isa_parser.py.

Instruction behaviors are described using C++ with two extensions: bitfield operators and operand type qualifiers. To avoid building a full C++ parser into the ISA description system (or conversely constraining the C++ that could be used for instruction descriptions), these extensions are implemented using regular expression matching and substitution. As a result, there are some syntactic constraints on their usage. The following two sections discuss these extensions in turn. The third section discusses operand parsing, the technique by which the parser automatically infers most instruction characteristics. The final two sections discuss the Python classes through which instruction formats interact with the library: CodeBlock, which analyzes and encapsulates instruction description code; and the instruction object parameter class, InstObjParams, which encapsulates the full set of parameters to be substituted into a template.

Bitfield operators

Simple bitfield extraction can be performed on rvalues using the <:> postfix operator. Bit numbering matches that used in global bitfield definitions (see Bitfield definitions). For example, Ra<7:0> extracts the low 8 bits of register Ra. Single-bit fields can be specified by eliminating the latter operand, e.g. Rb<31:>. Unlike in global bitfield definitions, the colon cannot be eliminated, as it becomes too difficult to distinguish bitfield operators from template arguments. In addition, the bit index parameters must be either identifiers or integer constants; expressions are not allowed. The bit operator will apply either to the syntactic token on its left, or, if that token is a closing parenthesis, to the parenthesized expression.

Operand type qualifiers

The effective type of an instruction operand (e.g., a register) may be specified by appending a period and a type qualifier to the operand name. The list of type qualifiers is architecture-specific; the def operand_types statement in the ISA description is used to specify it. The specification is in the form of a Python dictionary which maps a type extension to a tuple containing a type description ("signed int", "unsigned int", or "float") and the operand size in bits. For example, the Alpha ISA definition is as follows:

def operand_types {{
    'sb' : ('signed int', 8),
    'ub' : ('unsigned int', 8),
    'sw' : ('signed int', 16),
    'uw' : ('unsigned int', 16),
    'sl' : ('signed int', 32),
    'ul' : ('unsigned int', 32),
    'sq' : ('signed int', 64),
    'uq' : ('unsigned int', 64),
    'sf' : ('float', 32),
    'df' : ('float', 64)
}};

Thus the Alpha 32-bit add instruction addl could be defined as:

Rc.sl = Ra.sl + Rb.sl;

The operations are performed using the types specified; the result will be converted from the specified type to the appropriate register value (in this case by sign-extending the 32-bit result to 64 bits, since Alpha integer registers are 64 bits in size).

Type qualifiers are allowed only on recognized instruction operands (see Instruction operands).

Instruction operands

Most of the automation provided by the parser is based on its recognition of the operands used in the instruction definition code. Most relevant instruction characteristics can be inferred from the operands: floating-point vs. integer instructions can be recognized by the registers used, an instruction that reads from a memory location is a load, etc. In combination with the bitfield operands and type qualifiers described above, most instructions can be described in a single line of code. In addition, most of the differences between simulator CPU models lies in the operand access mechanisms; by generating the code for these accesses automatically, a single description suffices for a variety of situations.

The ISA description provides a list of recognized instruction operands and their characteristics via the def operands statement. This statement specifies a Python dictionary that maps operand strings to a five-element tuple. The elements of the tuple specify the operand as follows:

  1. the operand class, which must be one of the strings "IntReg", "FloatReg", "Mem", "NPC", or "ControlReg", indicating an integer register, floating-point register, memory location, the next program counter (NPC), or a control register, respectively.
  2. the default type of the operand (an extension string defined in the def operand_types block),
  3. a specifier indicating how specific instances of the operand are decoded (e.g., a bitfield name),
  4. a string or triple of strings indicating the instruction flags that can be inferred when the operand is used, and
  5. a sort priority used to control the order of operands in disassembly.

For example, a simplified subset of the Alpha ISA operand traits map is as follows:

def operands {{
    'Ra': ('IntReg', 'uq', 'RA', 'IsInteger', 1),
    'Rb': ('IntReg', 'uq', 'RB', 'IsInteger', 2),
    'Rc': ('IntReg', 'uq', 'RC', 'IsInteger', 3),
    'Fa': ('FloatReg', 'df', 'FA', 'IsFloating', 1),
    'Fb': ('FloatReg', 'df', 'FB', 'IsFloating', 2),
    'Fc': ('FloatReg', 'df', 'FC', 'IsFloating', 3),
    'Mem': ('Mem', 'uq', None, ('IsMemRef', 'IsLoad', 'IsStore'), 4),
    'NPC': ('NPC', 'uq', None, ( one, None, 'IsControl'), 4)
}};

The operand named Ra is an integer register, default type uq (unsigned quadword), uses the RA bitfield from the instruction, implies the IsInteger instruction flag, and has a sort priority of 1 (placing it first in any list of operands).

For the instrucion flag element, a single string (such as 'IsInteger' implies an unconditionally inferred instruction flag. If the flag operand is a triple, the first element is unconditional, the second is inferred when the operand is a source, and the third when it is a destination. Thus the ('IsMemRef', 'IsLoad', 'IsStore') element for memory references indicates that any instruction with a memory operand is marked as a memory reference. In addition, if the memory operand is a source, the instruction is marked as a load, while if the operand is a destination, the instruction is marked a store. Similarly, the (None, None, 'IsControl') tuple for the NPC operand indicates that any instruction that writes to the NPC is a control instruction, but instructions which merely reference NPC as a source do not receive any default flags.

Note that description code parsing uses regular expressions, which limits the ability of the parser to infer the nature of a partciular operand. In particular, destination operands are distinguished from source operands solely by testing whether the operand appears on the left-hand side of an assignment operator (=). Destination operands that are assigned to in a different fashion, e.g. by being passed by reference to other functions, must still appear on the left-hand side of an assignment to be properly recognized as destinations. The parser also does not recognize C compound assignments, e.g., +=. If an operand is both a source and a destination, it must appear on both the left- and right-hand sides of =.

Another limitation of regular-expression-based code parsing is that control flow in the code block is not recognized. Combined with the details of how register updates are performed in the CPU models, this means that destinations cannot be updated conditionally. If a particular register is recognized as a destination register, that register will always be updated at the end of the execute() method, and thus the code must assign a valid value to that register along each possible code path within the block.

The CodeBlock class

An instruction format requests processing of a string containing instruction description code by passing the string to the CodeBlock constructor. The constructor performs all of the needed analysis and processing, storing the results in the returned object. Among the CodeBlock fields are:

  • orig_code: the original code string.
  • code: a processed string containing legal C++ code, derived from the original code by substituting in the bitfield operators and munging operand type qualifiers (s/\./_/) to make valid C++ identifiers.
  • constructor: code for the constructor of an instruction object, initializing various C++ object fields including the number of operands and the register indices of the operands.
  • exec_decl: code to declare the C++ variables corresponding to the operands, for use in an execution emulation function.
  • *_rd: code to read the actual operand values into the corresponding C++ variables for source operands. The first part of the name indicates the relevant CPU model (currently simple and dtld are supported).
  • *_wb: code to write the C++ variable contents back to the appropriate register or memory location. Again, the first part of the name reflects the CPU model.
  • *_mem_rd, *_nonmem_rd, *_mem_wb, *_nonmem_wb: as above, but with memory and non-memory operands segregated.
  • flags: the set of instruction flags implied by the operands.
  • op_class: a basic guess at the instruction's operation class (see OpClass) based on the operand types alone.

The InstObjParams class

Instances of the InstObjParams class encapsulate all of the parameters needed to substitute into a code template, to be used as the argument to a template's subst() method (see Template definitions). The first three constructor arguments populate the object's mnemonic, class_name, and (optionally) base_class members. The fourth (optional) argument is a CodeBlock object; all of the members of the provided CodeBlock object are copied to the new object, making them accessible for template substitution. Any remaining operands are interpreted as either additional instruction flags (appended to the flags list inherited from the CodeBlock argument, if any), or as an operation class (overriding any op_class from the CodeBlock).