Difference between revisions of "Cache Coherence Protocols"

From gem5
Jump to: navigation, search
(MESI_CMP_directory: Moved the protocol to a separate page.)
(MOESI_CMP_directory: Moved the protocol to separate page.)
Line 203: Line 203:
 
   
 
   
 
  |}
 
  |}
 
== MOESI_CMP_directory ==
 
 
'''Editing in progress.'''
 
 
=== Protocol Overview ===
 
 
* TODO: cache hierarchy
 
 
* In contrast with the MESI protocol, the MOESI protocol introduces an additional '''Owned''' state.
 
* The MOESI protocol also includes many coalescing optimizations not available in the MESI protocol.
 
 
=== Related Files ===
 
 
* '''src/mem/protocols'''
 
** '''MOESI_CMP_directory-L1cache.sm''': L1 cache controller specification
 
** '''MOESI_CMP_directory-L2cache.sm''': L2 cache controller specification
 
** '''MOESI_CMP_directory-dir.sm''': directory controller specification
 
** '''MOESI_CMP_directory-dma.sm''': dma controller specification
 
** '''MOESI_CMP_directory-msg.sm''': message type specification
 
** '''MOESI_CMP_directory.slicc''': container file
 
 
=== L1 Cache Controller ===
 
 
* '''Stable States and Invariants'''
 
 
{| border="1" cellpadding="10" class="wikitable"
 
! States !! Invariants
 
|-
 
| '''MM''' || The cache block is held exclusively by this node and is potentially modified (similar to conventional "M" state).
 
|-
 
| '''MM_W''' || The cache block is held exclusively by this node and is potentially modified (similar to conventional "M" state). Replacements and DMA accesses are not allowed in this state. The block automatically transitions to MM state after a timeout.
 
|-
 
| '''O''' || The cache block is owned by this node. It has not been modified by this node. No other node holds this block in exclusive mode, but sharers potentially exist.
 
|-
 
| '''M''' || The cache block is held in exclusive mode, but not written to (similar to conventional "E" state). No other node holds a copy of this block. Stores are not allowed in this state.
 
|-
 
| '''M_W''' || The cache block is held in exclusive mode, but not written to (similar to conventional "E" state). No other node holds a copy of this block. Only loads and stores are allowed. Silent upgrade happens to MM_W state on store. Replacements and DMA accesses are not allowed in this state. The block automatically transitions to M state after a timeout.
 
|-
 
| '''S''' ||  The cache block is held in shared state by 1 or more nodes. Stores are not allowed in this state.
 
|-
 
| '''I''' || The cache block is invalid.
 
|}
 
 
* '''FSM Abstraction'''
 
 
'''The notation used in the controller FSM diagrams is described [[#Coherence_controller_FSM_Diagrams|here]].'''
 
 
[[File:MOESI_CMP_directory_L1cache_FSM.jpg|center]]
 
 
** '''Optimizations'''
 
 
{| border="1" cellpadding="10" class="wikitable"
 
! States !! Description
 
|-
 
| '''SM''' || A GETX has been issued to get exclusive permissions for an impending store to the cache block, but an old copy of the block is still present. Stores and Replacements are not allowed in this state.
 
|-
 
| '''OM''' || A GETX has been issued to get exclusive permissions for an impending store to the cache block, the data has been received, but all expected acknowledgments have not yet arrived. Stores and Replacements are not allowed in this state.
 
|}
 
 
'''The notation used in the controller FSM diagrams is described [[#Coherence_controller_FSM_Diagrams|here]].'''
 
 
[[File:MOESI_CMP_directory_L1cache_optim_FSM.jpg|center]]
 
 
=== L2 Cache Controller ===
 
 
* '''Stable States and Invariants'''
 
 
{| border="1" cellpadding="10" class="wikitable"
 
! Intra-chip Inclusion !! Inter-chip Exclusion !! States !! Description
 
|-
 
| '''<span style="color:#808080">Not in any L1 or L2 at this chip</span>''' || '''May be present at other chips''' || '''NP/I''' || The cache block at this chip is invalid.
 
|-
 
| rowspan="6"| '''<span style="color:#00CC99">Not in L2, but in 1 or more L1s at this chip</span>''' || rowspan="3"|'''May be present at other chips''' || '''ILS''' || The cache block is not present at L2 on this chip. It is shared locally by L1 nodes in this chip.
 
|-
 
| '''ILO''' || The cache block is not present at L2 on this chip. Some L1 node in this chip is an owner of this cache block.
 
|-
 
| '''ILOS''' || The cache block is not present at L2 on this chip. Some L1 node in this chip is an owner of this cache block. There are also L1 sharers of this cache block in this chip.
 
|-
 
| rowspan="3"|'''Not present at any other chip''' || '''ILX''' || The cache block is not present at L2 on this chip. It is held in exclusive mode by some L1 node in this chip.
 
|-
 
| '''ILOX''' || The cache block is not present at L2 on this chip. It is held exclusively by this chip and some L1 node in this chip is an owner of the block.
 
|-
 
| '''ILOSX''' || The cache block is not present at L2 on this chip. It is held exclusively by this chip. Some L1 node in this chip is an owner of the block. There are also L1 sharers of this cache block in this chip.
 
|-
 
| rowspan="3"| '''<span style="color:#99CCFF">In L2, but not in any L1 at this chip</span>''' || rowspan="2"|'''May be present at other chips''' || '''S''' || The cache block is not present at L1 on this chip. It is held in shared mode at L2 on this chip and is also potentially shared across chips.
 
|-
 
| '''O''' || The cache block is not present at L1 on this chip. It is held in owned mode at L2 on this chip. It is also potentially shared across chips.
 
|-
 
| '''Not present at any other chip''' || '''M''' || The cache block is not present at L1 on this chip. It is present at L2 on this chip and is potentially modified.
 
|- 
 
| rowspan="3"| '''<span style="color:#CC99FF">Both in L2, and 1 or more L1s at this chip</span>''' || rowspan="2"|'''May be present at other chips''' || '''SLS''' || The cache block is present at L2 in shared mode on this chip. There exists local L1 sharers of the block on this chip. It is also potentially shared across chips.
 
|-
 
| '''OLS''' || The cache block is present at L2 in owned mode on this chip. There exists local L1 sharers of the block on this chip. It is also potentially shared across chips.
 
|-
 
| '''Not present at any other chip''' || '''OLSX''' || The cache block is present at L2 in owned mode on this chip. There exists local L1 sharers of the block on this chip. It is held exclusively by this chip.
 
|}
 
 
 
* '''FSM Abstraction'''
 
 
The controller is described in 2 parts. The first picture shows transitions between all "intra-chip inclusion" categories and within categories 1, 3, 4. Transitions within category 2 (Not in L2, but in 1 or more L1s at this chip) are shown in the second picture.
 
 
'''The notation used in the controller FSM diagrams is described [[#Coherence_controller_FSM_Diagrams|here]]. Transitions involving other chips are annotated in <span style="color:#CC3300">brown</span>.'''
 
 
[[File:MOESI_CMP_directory_L2cache_FSM_part_1.jpg|center]]
 
 
The second picture below expands the central hexagonal portion of the above picture to show transitions within category 2 (Not in L2, but in 1 or more L1s at this chip).
 
 
'''The notation used in the controller FSM diagrams is described [[#Coherence_controller_FSM_Diagrams|here]]. Transitions involving other chips are annotated in <span style="color:#CC3300">brown</span>.'''
 
 
[[File:MOESI_CMP_directory_L2cache_FSM_part_2.jpg|center]]
 
 
=== Directory Controller ===
 
 
* '''Stable States and Invariants'''
 
 
{| border="1" cellpadding="10" class="wikitable"
 
! States !! Invariants
 
|-
 
| '''M''' || The cache block is held in exclusive state by only 1 node (which is also the owner). There are no sharers of this block. The data is potentially different from that in memory.
 
|-
 
| '''O''' || The cache block is owned by exactly 1 node. There may be sharers of this block. The data is potentially different from that in memory.
 
|-
 
| '''S''' || The cache block is held in shared state by 1 or more nodes. No node has ownership of the block. The data is consistent with that in memory (Check).
 
|-
 
| '''I''' || The cache block is invalid.
 
|}
 
 
* '''FSM Abstraction'''
 
 
'''The notation used in the controller FSM diagrams is described [[#Coherence_controller_FSM_Diagrams|here]].'''
 
 
[[File:MOESI_CMP_directory_dir_FSM.jpg|center]]
 
 
=== Other features ===
 
 
* '''Timeouts''':
 
 
''Rathijit will do it''
 
  
 
== Network_test ==
 
== Network_test ==

Revision as of 00:21, 9 July 2013

Common Notations and Data Structures

Coherence Messages

These are described in the <protocol-name>-msg.sm file for each protocol.

Message Description
ACK/NACK positive/negative acknowledgement for requests that wait for the direction of resolution before deciding on the next action. Examples are writeback requests, exclusive requests.
GETS request for shared permissions to satisfy a CPU's load or IFetch.
GETX request for exclusive access.
INV invalidation request. This can be triggered by the coherence protocol itself, or by the next cache level/directory to enforce inclusion or to trigger a writeback for a DMA access so that the latest copy of data is obtained.
PUTX request for writeback of cache block. Some protocols (e.g. MOESI_CMP_directory) may use this only for writeback requests of exclusive data.
PUTS request for writeback of cache block in shared state.
PUTO request for writeback of cache block in owned state.
PUTO_Sharers request for writeback of cache block in owned state but other sharers of the block exist.
UNBLOCK message to unblock next cache level/directory for blocking protocols.

AccessPermissions

These are associated with each cache block and determine what operations are permitted on that block. It is closely correlated with coherence protocol states.

Permissions Description
Invalid The cache block is invalid. The block must first be obtained (from elsewhere in the memory hierarchy) before loads/stores can be performed. No action on invalidates (except maybe sending an ACK). No action on replacements. The associated coherence protocol states are I or NP and are stable states in every protocol.
Busy TODO
Read_Only Only operations permitted are loads, writebacks, invalidates. Stores cannot be performed before transitioning to some other state.
Read_Write Loads, stores, writebacks, invalidations are allowed. Usually indicates that the block is dirty.

Data Structures

  • Message Buffers:TODO
  • TBE Table: TODO
  • Timer Table: This maintains a map of address-based timers. For each target address, a timeout value can be associated and added to the Timer table. This data structure is used, for example, by the L1 cache controller implementation of the MOESI_CMP_directory protocol to trigger separate timeouts for cache blocks. Internally, the Timer Table uses the event queue to schedule the timeouts. The TimerTable supports a polling-based interface, isReady() to check if a timeout has occurred. Timeouts on addresses can be set using the set() method and removed using the unset() method.
Related Files:
src/mem/ruby/system/TimerTable.hh: Declares the TimerTable class
src/mem/ruby/system/TimerTable.cc: Implementation of the methods of the TimerTable class, that deals with setting addresses & timeouts, scheduling events using the event queue.

Coherence controller FSM Diagrams

  • The Finite State Machines show only the stable states
  • Transitions are annotated using the notation "Event list" or "Event list : Action list" or "Event list : Action list : Event list". For example, Store : GETX indicates that on a Store event, a GETX message was sent whereas GETX : Mem Read indicates that on receiving a GETX message, a memory read request was sent. Only the main triggers and actions are listed.
  • Optional actions (e.g. writebacks depending on whether or not the block is dirty) are enclosed within [ ]
  • In the diagrams, the transition labels are associated with the arc that cuts across the transition label or the closest arc.

MOESI_hammer

This is an implementation of AMD's Hammer protocol, which is used in AMD's Hammer chip (also know as the Opteron or Athlon 64). The protocol implements both the original a HyperTransport protocol, as well as the more recent ProbeFilter protocol. The protocol also includes a full-bit directory mode.

Related Files

  • src/mem/protocols
    • MOESI_hammer-cache.sm: cache controller specification
    • MOESI_hammer-dir.sm: directory controller specification
    • MOESI_hammer-dma.sm: dma controller specification
    • MOESI_hammer-msg.sm: message type specification
    • MOESI_hammer.slicc: container file

Cache Hierarchy

This protocol implements a 2-level private cache hierarchy. It assigns separate Instruction and Data L1 caches, and a unified L2 cache to each core. These caches are private to each core and are controlled with one shared cache controller. This protocol enforce exclusion between L1 and L2 caches.

Stable States and Invariants

States Invariants
MM The cache block is held exclusively by this node and is potentially locally modified (similar to conventional "M" state).
O The cache block is owned by this node. It has not been modified by this node. No other node holds this block in exclusive mode, but sharers potentially exist.
M The cache block is held in exclusive mode, but not written to (similar to conventional "E" state). No other node holds a copy of this block. Stores are not allowed in this state.
S The cache line holds the most recent, correct copy of the data. Other processors in the system may hold copies of the data in the shared state, as well. The cache line can be read, but not written in this state.
I The cache line is invalid and does not hold a valid copy of the data.

Cache controller

The notation used in the controller FSM diagrams is described here.

MOESI_hammer supports cache flushing. To flush a cache line, the cache controller first issues a GETF request to the directory to block the line until the flushing is completed. It then issues a PUTF and writes back the cache line.

MOESI hammer cache FSM.jpg

Directory controller

MOESI_hammer memory module, unlike a typical directory protocol, does not contain any directory state and instead broadcasts requests to all the processors in the system. In parallel, it fetches the data from the DRAM and forward the response to the requesters.

probe filter: TODO

  • Stable States and Invariants
States Invariants
NX Not Owner, probe filter entry exists, block in O at Owner.
NO Not Owner, probe filter entry exists, block in E/M at Owner.
S Data clean, probe filter entry exists pointing to the current owner.
O Data clean, probe filter entry exists.
E Exclusive Owner, no probe filter entry.
  • Controller


The notation used in the controller FSM diagrams is described here.

MOESI hammer dir FSM.jpg

MOESI_CMP_token

Protocol Overview

  • This protocol also models a 2-level cache hierarchy.
  • It maintains coherence permission by explicitly exchanging and counting tokens.
  • A fix number of token are assigned to each cache block in the beginning, the number of token remains unchanged.
  • To write a block, the processor must have all the token for that block. For reading at least one token is required.
  • The protocol also has a persistent message support to avoid starvation.

Related Files

  • src/mem/protocols
    • MOESI_CMP_token-L1cache.sm: L1 cache controller specification
    • MOESI_CMP_token-L2cache.sm: L2 cache controller specification
    • MOESI_CMP_token-dir.sm: directory controller specification
    • MOESI_CMP_token-dma.sm: dma controller specification
    • MOESI_CMP_token-msg.sm: message type specification
    • MOESI_CMP_token.slicc: container file

Controller Description

  • L1 Cache
States Invariants
MM The cache block is held exclusively by this node and is potentially modified (similar to conventional "M" state).
MM_W The cache block is held exclusively by this node and is potentially modified (similar to conventional "M" state). Replacements and DMA accesses are not allowed in this state. The block automatically transitions to MM state after a timeout.
O The cache block is owned by this node. It has not been modified by this node. No other node holds this block in exclusive mode, but sharers potentially exist.
M The cache block is held in exclusive mode, but not written to (similar to conventional "E" state). No other node holds a copy of this block. Stores are not allowed in this state.
M_W The cache block is held in exclusive mode, but not written to (similar to conventional "E" state). No other node holds a copy of this block. Only loads and stores are allowed. Silent upgrade happens to MM_W state on store. Replacements and DMA accesses are not allowed in this state. The block automatically transitions to M state after a timeout.
S The cache block is held in shared state by 1 or more nodes. Stores are not allowed in this state.
I The cache block is invalid.
  • L2 cache
States Invariants
NP The cache block is held exclusively by this node and is potentially locally modified (similar to conventional "M" state).
O The cache block is owned by this node. It has not been modified by this node. No other node holds this block in exclusive mode, but sharers potentially exist.
M The cache block is held in exclusive mode, but not written to (similar to conventional "E" state). No other node holds a copy of this block. Stores are not allowed in this state.
S The cache line holds the most recent, correct copy of the data. Other processors in the system may hold copies of the data in the shared state, as well. The cache line can be read, but not written in this state.
I The cache line is invalid and does not hold a valid copy of the data.
  • Directory controller
States Invariants
O Owner .
NO Not Owner.
L Locked.

Network_test

This is a dummy cache coherence protocol that is used to operate the ruby network tester. The details about running the network tester can be found here.

Related Files

  • src/mem/protocols
    • Network_test-cache.sm: cache controller specification
    • Network_test-dir.sm: directory controller specification
    • Network_test-msg.sm: message type specification
    • Network_test.slicc: container file

Cache Hierarchy

This protocol assumes a 1-level cache hierarchy. The role of the cache is to simply send messages from the cpu to the appropriate directory (based on the address), in the appropriate virtual network (based on the message type). It does not track any state. Infact, no CacheMemory is created unlike other protocols. The directory receives the messages from the caches, but does not send any back. The goal of this protocol is to enable simulation/testing of just the interconnection network.

Stable States and Invariants

States Invariants
I Default state of all cache blocks

Cache controller

  • Requests, Responses, Triggers:
    • Load, Instruction fetch, Store from the core.

The network tester (in src/cpu/testers/networktest/networktest.cc) generates packets of the type ReadReq, INST_FETCH, and WriteReq, which are converted into RubyRequestType:LD, RubyRequestType:IFETCH, and RubyRequestType:ST, respectively, by the RubyPort (in src/mem/ruby/system/RubyPort.hh/cc). These messages reach the cache controller via the Sequencer. The destination for these messages is determined by the traffic type, and embedded in the address. More details can be found here.

  • Main Operation:
    • The goal of the cache is only to act as a source node in the underlying interconnection network. It does not track any states.
    • On a LD from the core:
      • it returns a hit, and
      • maps the address to a directory, and issues a message for it of type MSG, and size Control (8 bytes) in the request vnet (0).
      • Note: vnet 0 could also be made to broadcast, instead of sending a directed message to a particular directory, by uncommenting the appropriate line in the a_issueRequest action in Network_test-cache.sm
    • On a IFETCH from the core:
      • it returns a hit, and
      • maps the address to a directory, and issues a message for it of type MSG, and size Control (8 bytes) in the forward vnet (1).
    • On a ST from the core:
      • it returns a hit, and
      • maps the address to a directory, and issues a message for it of type MSG, and size Data (72 bytes) in the response vnet (2).
    • Note: request, forward and response are just used to differentiate the vnets, but do not have any physical significance in this protocol.

Directory controller

  • Requests, Responses, Triggers:
    • MSG from the cores
  • Main Operation:
    • The goal of the directory is only to act as a destination node in the underlying interconnection network. It does not track any states.
    • The directory simply pops its incoming queue upon receiving the message.

Other features

    • This protocol assumes only 3 vnets.
    • It should only be used when running the ruby network test.