Difference between revisions of "Ruby"

From gem5
Jump to: navigation, search
(SLICC + Coherence protocols:)
m
Line 21: Line 21:
 
The following cache coherence protocols are supported:
 
The following cache coherence protocols are supported:
  
# MI_example
+
# '''MI_example'''
# MESI_CMP_directory
+
# '''MESI_CMP_directory'''
# MOESI_CMP_directory
+
# '''MOESI_CMP_directory'''
# MOESI_CMP_token
+
# '''MOESI_CMP_token'''
# MOESI_hammer
+
# '''MOESI_hammer'''
# Network_test
+
# '''Network_test'''
  
 
Each protocol is described in detail [[Cache Coherence Protocols|here]].
 
Each protocol is described in detail [[Cache Coherence Protocols|here]].
Line 32: Line 32:
 
=== Protocol independent memory components ===
 
=== Protocol independent memory components ===
  
# Sequencer
+
# '''Sequencer'''
# Cache Memory
+
# '''Cache Memory'''
# Replacement Policies
+
# '''Replacement Policies'''
# Memory Controller
+
# '''Memory Controller'''
  
 
Implementation details are described [[Coherence-Protocol-Independent Memory Components|here]].
 
Implementation details are described [[Coherence-Protocol-Independent Memory Components|here]].

Revision as of 07:15, 25 April 2011

High level components of Ruby

Ruby implements a detailed simulation model for the memory subsystem. It models inclusive/exclusive cache hierarchies with various replacement policies, coherence protocol implementations, interconnection networks, DMA and memory controllers, various sequencers that initiate memory requests and handle responses. The models are modular, flexible and highly configurable. Three key aspects of these models are:

  1. Separation of concerns -- for example, the coherence protocol specifications are separate from the replacement policies and cache index mapping, the network topology is specified separately from the implementation.
  2. Rich configurability -- almost any aspect affecting the memory hierarchy functionality and timing can be controlled.
  3. Rapid prototyping -- a high-level specification language, SLICC, is used to specify functionality of various controllers.

The following picture, taken from the GEMS tutorial in ISCA 2005, shows a high-level view of the main components in Ruby.

Ruby overview.jpg

SLICC + Coherence protocols:

   Need to say what is SLICC and whats its purpose. 
   Talk about high level strcture of a typical coherence protocol file, that SLICC uses to generate code. 
   A simple example structure from protocol like MI_example can help here.

SLICC stands for Specification Language for Implementing Cache Coherence. It is a domain specific language that is used for specifying cache coherence protocols. In essence, a cache coherence protocol behaves like a state machine. SLICC is used for specifying the behavior of the state machine. Since the aim is to model the hardware as close as possible, SLICC imposes constraints on the state machines that can be specified. For example, SLICC can impose restrictions on the number of transitions that can take place in a single cycle. Apart from protocol specification, SLICC also combines together some of the components in the memory model. As can be seen in the following picture, the state machine takes its input from the input ports of the inter-connection network and queues the output at the output ports of the network, thus tying together the cache / memory controllers with the inter-connection network itself.

Slicc overview.jpg

The following cache coherence protocols are supported:

  1. MI_example
  2. MESI_CMP_directory
  3. MOESI_CMP_directory
  4. MOESI_CMP_token
  5. MOESI_hammer
  6. Network_test

Each protocol is described in detail here.

Protocol independent memory components

  1. Sequencer
  2. Cache Memory
  3. Replacement Policies
  4. Memory Controller

Implementation details are described here.

Arka will do it

Interconnection Network

The interconnection network connects the various components of the memory hierarchy (cache, memory, dma controllers) together.

Interconnection network.jpg

The key components of an interconnection network are:

  1. Topology
  2. Routing
  3. Flow Control
  4. Router Microarchitecture

More details about the network model implementation are described here.

Life of a memory request in Ruby

In this section we will provide a high level overview of how a memory request is serviced by Ruby as a whole and what components in Ruby it goes through. For detailed operations within each components though, refer to previous sections describing each component in isolation.

  1. A memory request from a core or hardware context of M5 enters the jurisdiction of Ruby through the RubyPort::recvTiming interface (in src/mem/ruby/system/RubyPort.hh/cc). The number of Rubyport instantiation in the simulated system is equal to the number of hardware thread context or cores (in case of non-multithreaded cores). A port from the side of each core is tied to a corresponding RubyPort.
  2. The memory request arrives as a M5 packet and RubyPort is responsible for converting it to a RubyRequest object that is understood by various components of Ruby. It also finds out if the request is for some PIO or not and maneuvers the packet to correct PIO. Finally once it has generated the corresponding RubyRequest object and ascertained that the request is a normal memory request (not PIO access), it passes the request to the Sequencer::makeRequest interface of the attached Sequencer object with the port (variable ruby_port holds the pointer to it). Observe that Sequencer class itself is a derived class from the RubyPort class.
  3. As mentioned in the section describing Sequencer class of Ruby, there are as many objects of Sequencer in a simulated system as the number of hardware thread context (which is also equal to the number of RubyPort object in the system) and there is one-to-one mapping between the Sequencer objects and the hardware thread context. Once a memory request arrives at the Sequencer::makeRequest, it does various accounting and resource allocation for the request and finally pushes the request to the Ruby's coherent cache hierarchy for satisfying the request while accounting for the delay in servicing the same. The request is pushed to the Cache hierarchy by enqueueing the request to the mandatory queue after accounting for L1 cache access latency. The mandatory queue (variable name m_mandatory_q_ptr) effectively acts as the interface between the Sequencer and the SLICC generated cache coherence files.
  4. L1 cache controllers (generated by SLICC according to the coherence protocol specifications) dequeues request from the mandatory queue and looks up the cache, makes necessary coherence state transitions and/or pushes the request to the next level of cache hierarchy as per the requirements. Different controller and components of SLICC generated Ruby code communicates among themselves through instantiations of MessageBuffer class of Ruby (src/mem/ruby/buffers/MessageBuffer.cc/hh) , which can act as ordered or unordered buffer or queues. Also the delays in servicing different steps for satisfying a memory request gets accounted for scheduling enqueue-ing and dequeue-ing operations accordingly. If the requested cache block may be found in L1 caches and with required coherence permissions then the request is satisfied and immediately returned. Otherwise the request is pushed to the next level of cache hierarchy through MessageBuffer. A request can go all the way up to the Ruby's Memory Controller (also called Directory in many protocols). Once the request get satisfied it is pushed upwards in the hierarchy through MessageBuffers.
  5. Once the requested cache block is available at L1 cache with desired coherence permissions, the L1 cache controller informs the corresponding Sequencer object by calling its readCallback or 'writeCallback method depending upon the type of the request. Note that by the time these methods on Sequencer are called the latency of servicing the request has been implicitly accounted for.
  6. The Sequencer then clears up the accounting information for the corresponding request and then calls the RubyPort::ruby_hit_callback method. This ultimately returns the result of the request to the corresponding port of the core/ hardware context of the frontend (M5).

Directory Structure

  • src/mem/
    • protocols: SLICC specification for coherence protocols
    • slicc: implementation for SLICC parser and code generator
    • ruby
      • buffers: implementation for message buffers that are used for exchanging information between the cache, directory, memory controllers and the interconnect
      • common: frequently used data structures, e.g. Address (with bit-manipulation methods), histogram, data block, basic types (int32, uint64, etc.)
      • eventqueue: Ruby’s event queue API for scheduling events on the gem5 event queue
      • filters: various Bloom filters (stale code from GEMS)
      • network: Interconnect implementation, sample topology specification, network power calculations
      • profiler: Profiling for cache events, memory controller events
      • recorder: Cache warmup and access trace recording
      • slicc_interface: Message data structure, various mappings (e.g. address to directory node), utility functions (e.g. conversion between address & int, convert address to cache line address)
      • system: Protocol independent memory components – CacheMemory, DirectoryMemory, Sequencer, RubyPort