Cache Coherence Protocols

From gem5
Revision as of 00:23, 9 July 2013 by Nilayvaish (talk | contribs) (MOESI_CMP_token: moved moesi cmp token to a separate page.)
Jump to: navigation, search

Common Notations and Data Structures

Coherence Messages

These are described in the <protocol-name>-msg.sm file for each protocol.

Message Description
ACK/NACK positive/negative acknowledgement for requests that wait for the direction of resolution before deciding on the next action. Examples are writeback requests, exclusive requests.
GETS request for shared permissions to satisfy a CPU's load or IFetch.
GETX request for exclusive access.
INV invalidation request. This can be triggered by the coherence protocol itself, or by the next cache level/directory to enforce inclusion or to trigger a writeback for a DMA access so that the latest copy of data is obtained.
PUTX request for writeback of cache block. Some protocols (e.g. MOESI_CMP_directory) may use this only for writeback requests of exclusive data.
PUTS request for writeback of cache block in shared state.
PUTO request for writeback of cache block in owned state.
PUTO_Sharers request for writeback of cache block in owned state but other sharers of the block exist.
UNBLOCK message to unblock next cache level/directory for blocking protocols.

AccessPermissions

These are associated with each cache block and determine what operations are permitted on that block. It is closely correlated with coherence protocol states.

Permissions Description
Invalid The cache block is invalid. The block must first be obtained (from elsewhere in the memory hierarchy) before loads/stores can be performed. No action on invalidates (except maybe sending an ACK). No action on replacements. The associated coherence protocol states are I or NP and are stable states in every protocol.
Busy TODO
Read_Only Only operations permitted are loads, writebacks, invalidates. Stores cannot be performed before transitioning to some other state.
Read_Write Loads, stores, writebacks, invalidations are allowed. Usually indicates that the block is dirty.

Data Structures

  • Message Buffers:TODO
  • TBE Table: TODO
  • Timer Table: This maintains a map of address-based timers. For each target address, a timeout value can be associated and added to the Timer table. This data structure is used, for example, by the L1 cache controller implementation of the MOESI_CMP_directory protocol to trigger separate timeouts for cache blocks. Internally, the Timer Table uses the event queue to schedule the timeouts. The TimerTable supports a polling-based interface, isReady() to check if a timeout has occurred. Timeouts on addresses can be set using the set() method and removed using the unset() method.
Related Files:
src/mem/ruby/system/TimerTable.hh: Declares the TimerTable class
src/mem/ruby/system/TimerTable.cc: Implementation of the methods of the TimerTable class, that deals with setting addresses & timeouts, scheduling events using the event queue.

Coherence controller FSM Diagrams

  • The Finite State Machines show only the stable states
  • Transitions are annotated using the notation "Event list" or "Event list : Action list" or "Event list : Action list : Event list". For example, Store : GETX indicates that on a Store event, a GETX message was sent whereas GETX : Mem Read indicates that on receiving a GETX message, a memory read request was sent. Only the main triggers and actions are listed.
  • Optional actions (e.g. writebacks depending on whether or not the block is dirty) are enclosed within [ ]
  • In the diagrams, the transition labels are associated with the arc that cuts across the transition label or the closest arc.

MOESI_hammer

This is an implementation of AMD's Hammer protocol, which is used in AMD's Hammer chip (also know as the Opteron or Athlon 64). The protocol implements both the original a HyperTransport protocol, as well as the more recent ProbeFilter protocol. The protocol also includes a full-bit directory mode.

Related Files

  • src/mem/protocols
    • MOESI_hammer-cache.sm: cache controller specification
    • MOESI_hammer-dir.sm: directory controller specification
    • MOESI_hammer-dma.sm: dma controller specification
    • MOESI_hammer-msg.sm: message type specification
    • MOESI_hammer.slicc: container file

Cache Hierarchy

This protocol implements a 2-level private cache hierarchy. It assigns separate Instruction and Data L1 caches, and a unified L2 cache to each core. These caches are private to each core and are controlled with one shared cache controller. This protocol enforce exclusion between L1 and L2 caches.

Stable States and Invariants

States Invariants
MM The cache block is held exclusively by this node and is potentially locally modified (similar to conventional "M" state).
O The cache block is owned by this node. It has not been modified by this node. No other node holds this block in exclusive mode, but sharers potentially exist.
M The cache block is held in exclusive mode, but not written to (similar to conventional "E" state). No other node holds a copy of this block. Stores are not allowed in this state.
S The cache line holds the most recent, correct copy of the data. Other processors in the system may hold copies of the data in the shared state, as well. The cache line can be read, but not written in this state.
I The cache line is invalid and does not hold a valid copy of the data.

Cache controller

The notation used in the controller FSM diagrams is described here.

MOESI_hammer supports cache flushing. To flush a cache line, the cache controller first issues a GETF request to the directory to block the line until the flushing is completed. It then issues a PUTF and writes back the cache line.

MOESI hammer cache FSM.jpg

Directory controller

MOESI_hammer memory module, unlike a typical directory protocol, does not contain any directory state and instead broadcasts requests to all the processors in the system. In parallel, it fetches the data from the DRAM and forward the response to the requesters.

probe filter: TODO

  • Stable States and Invariants
States Invariants
NX Not Owner, probe filter entry exists, block in O at Owner.
NO Not Owner, probe filter entry exists, block in E/M at Owner.
S Data clean, probe filter entry exists pointing to the current owner.
O Data clean, probe filter entry exists.
E Exclusive Owner, no probe filter entry.
  • Controller


The notation used in the controller FSM diagrams is described here.

MOESI hammer dir FSM.jpg

Network_test

This is a dummy cache coherence protocol that is used to operate the ruby network tester. The details about running the network tester can be found here.

Related Files

  • src/mem/protocols
    • Network_test-cache.sm: cache controller specification
    • Network_test-dir.sm: directory controller specification
    • Network_test-msg.sm: message type specification
    • Network_test.slicc: container file

Cache Hierarchy

This protocol assumes a 1-level cache hierarchy. The role of the cache is to simply send messages from the cpu to the appropriate directory (based on the address), in the appropriate virtual network (based on the message type). It does not track any state. Infact, no CacheMemory is created unlike other protocols. The directory receives the messages from the caches, but does not send any back. The goal of this protocol is to enable simulation/testing of just the interconnection network.

Stable States and Invariants

States Invariants
I Default state of all cache blocks

Cache controller

  • Requests, Responses, Triggers:
    • Load, Instruction fetch, Store from the core.

The network tester (in src/cpu/testers/networktest/networktest.cc) generates packets of the type ReadReq, INST_FETCH, and WriteReq, which are converted into RubyRequestType:LD, RubyRequestType:IFETCH, and RubyRequestType:ST, respectively, by the RubyPort (in src/mem/ruby/system/RubyPort.hh/cc). These messages reach the cache controller via the Sequencer. The destination for these messages is determined by the traffic type, and embedded in the address. More details can be found here.

  • Main Operation:
    • The goal of the cache is only to act as a source node in the underlying interconnection network. It does not track any states.
    • On a LD from the core:
      • it returns a hit, and
      • maps the address to a directory, and issues a message for it of type MSG, and size Control (8 bytes) in the request vnet (0).
      • Note: vnet 0 could also be made to broadcast, instead of sending a directed message to a particular directory, by uncommenting the appropriate line in the a_issueRequest action in Network_test-cache.sm
    • On a IFETCH from the core:
      • it returns a hit, and
      • maps the address to a directory, and issues a message for it of type MSG, and size Control (8 bytes) in the forward vnet (1).
    • On a ST from the core:
      • it returns a hit, and
      • maps the address to a directory, and issues a message for it of type MSG, and size Data (72 bytes) in the response vnet (2).
    • Note: request, forward and response are just used to differentiate the vnets, but do not have any physical significance in this protocol.

Directory controller

  • Requests, Responses, Triggers:
    • MSG from the cores
  • Main Operation:
    • The goal of the directory is only to act as a destination node in the underlying interconnection network. It does not track any states.
    • The directory simply pops its incoming queue upon receiving the message.

Other features

    • This protocol assumes only 3 vnets.
    • It should only be used when running the ruby network test.