Google Summer of Code

From gem5
Revision as of 05:23, 12 March 2008 by Ksewell (talk | contribs) (Heterogeneous ISA Systems)
Jump to: navigation, search

Introduction

The Google Summer of Code (SoC) is a great opportunity for students to contribute to open source software projects. The open source projects get additional contributions and active developers while the students get some money and gain experience in large distributed software development.

About M5

M5 is a modular platform for computer system architecture research, encompassing system-level architecture as well as processor microarchitecture. At its core M5 provides a generic, object-oriented discrete-event simulation framework. This includes a foundation for: defining, parameterizing, configuring, and marshaling simulation objects. The foundation along with various pre-made object models allow M5 to simulate both single systems and multiple networked systems deterministically. Simulations can be run using one binary (syscall emulation) or booting an entire operating system such as Linux or Solaris (full-system) on most major ISAs (SPARC, MIPS, ALPHA, ARM, x86/64). The simulator is written in a combination of C++ and Python and is pervasively object oriented. Python is used for configuration and not-performance critical parts, while C++ is used for the core of the simulation framework. Using the M5 simulator, computer architecture researchers around the world have been able to successfully model their systems and publish their work in magazines, conferences, and academic journals. So far, the Publications list has reached more than 50 and it grows every year.

Project Ideas

Below is a list of possible project ideas and starting points, however we're open to other ideas students may have. All the ideas listed here will require some familiarity with Python and a good grasp of advanced C++ concepts.

Direct Execution model

Direct execution is a well known technique for speeding up simulation employed by a number of simulators. A direct execution simulator uses the native machine to execute guest instructions without interpretation. Methods of direct execution include: static code instrumentation, dynamic code instrumentation, full OS virtualization, and application virtualization. There are several mechanisms for implementing direct execution with different pros and cons.

  1. The Linux Kernel Virtual Machine
    • PRO: Could be brought up quickly and can leverage an existing virtualization system
    • CON: Can only be used for fast forward since instructions cannot be trapped
    • http://kvm.qumranet.com/kvmwiki
  2. PIN based application virtualization
  3. Custom implementation
    • PRO: Can do exactly what we want
    • CON: Significant effort

Parallelization

As the industry moves toward multicore systems, software will need to become parallel if it is to benefit from successive generations of chips. There are a limited number of ways objects can interact with each other in M5, the scope of this problem is not as vast as it might seem at first. Objects schedule their own events and thus reasonably long chains of independent events are nearly "ready-made" to be parallelized. Previous simulators such as the Wisconsin Wind-Tunnel (which one of our mentors was a co-author of) have been parallel. This task can largely be divided into 3 parts:(1) Identify blocks of code that share global caching structures and make them per thread (through either __thread or setspecific() getspecefic()); (2) Assign each simobject to a thread (and make sure that any events that that simobject generates are bound to that thread); (3) define objects that can be bound threads. This is the only place where intra-thread scheduling is required.

Memory Network Models

Interconnection networks are becoming important for multicore research. Having various models for on-chip networks would be very useful.

  • Mesh models
  • Crossbar models

Directory Coherence Protocol

As the number of cores increases, coherence traffic will consume an increasing proportion of system resources. Directory base cache coherence protocols drastically reduce the resources required to maintain coherence across a large number of cores. This project can go hand-in-hand with an effort to implement a new network.

Detailed In-Order core model

There is currently no detailed In-Order CPU model in M5; there is code to start with but nothing that is fully fleshed out. Cores will likely become lower power and have reduced complexity in the future making in-order cores attractive for such systems.

Graphics Processing Unit (GPU) model

Graphics Processing Units interface with general purpose CPUs to accelerate common graphics operations such as texture mapping and shading. Recently, Multi-core GPUs such as Intel's Larabee project have been proposed in an attempt to leverage the power of many-core systems for graphics processors. Creating a flexible, graphics processing CPU Model would allow researchers to more realistically build systems that include one or more GPUs in there framework.

Interface to an HDL

People often write Verilog (or some other HDL) code for future chip designs and desire the capability to simulate those designs before sending them to production. By adding a PLI interface to M5, all of the features of M5 could be used to improve the ability for designers to test their Verilog code. In addition, M5 could fill in the gaps for missing functionality of a chip design to allow testing as the unit is being built. For example, one might desire to have no support for OS code or paging in a test chip and could use M5 to provide those features.

Interface to a Power Analysis Tool

Power consumption has become a critical concern for computer architects in recent years. Interfacing M5 to a Power Analysis Tool such as Wattch would allow researchers to simulate with the flexibility of M5 and simultaneously track the power for important system components.

Sampling/fast-forwarding techniques

Since simulators are very slow, fast-forwarding to an interesting point in an execution and sampling portions of the execution stream with detailed execution can help improve simulation performance.

  • Using the techniques learned in the SMARTS work would be a good guide
  • Coupling direct-execution with this would have the most benefit.

Flash Memory Model

Flash memory is becoming more popular these days and is seeing serious consideration as another level in the memory hierarchy between DRAM and disks, or, as is the case for laptops, as a potential replacement for disks. Adding performance models for flash memory parts could significantly improve research in this area.

Heterogeneous ISA Systems

Multi-ISA, multicore systems (such as the CELL broadband engine) gives designers the advantage of connecting specialized units (DSPs, GPUs) of different ISAs to their general purpose counterparts with a different ISA. Implementing this capability in M5 would allow researchers to study these systems in a more realistic fashion. M5 already does a pretty good job of compartmentalizing all ISA-specific code; but since no one has ventured into trying to simulate a heterogeneous system, that ISA-independence of M5 hasn't been fully stressed. This project would require understanding how the ISAs are "dropped" into M5 CPU models and also figuring out how to build M5 with multiple ISAs active.

Regression Testing & Benchmarking

As new benchmarking suites and workloads become available, the M5 community must ensure that M5 can successfully execute the most relevant workloads of interest. For example, creating the appropriate configuration files and scripts to run an important benchmark suites such as SPEC2006 would be useful not only in continuing to verify the correctness of M5 but also to future researchers considering what simulation platform to use. A project like this would involve finding benchmarks in a particular domain (databases,networking,scientific,etc.), configuring and running them to completion in M5, and then adding the benchmarks to the M5 regression testing suite.

Other Information

The most successful project is one that is going to be interesting to you. We've got some suggested projects above, but the suggestions are just that. If there is something related that you would rather do please put that in your proposal.

Please describe who you are and what you've done in your application. In particular we would like to know about other projects you've worked on and your familiarity with Python and C++. The M5 code base tends to exercise most of the C++ standard (and the non-standard). A good familiarity with C++ and object oriented programming is necessary for a successful M5 project.

Additionally, we would like to see a set of goals/milestones in your proposal. We don't expect the list to be etched in stone, however stepping back and figuring out how you're planning to get from point A to point B is a good way for your and your mentors to track your progress and evaluate the how reasonably your goals are. Finally, we expect that working on M5 would be your main summer activity.

Mentors / M5 Simulation Team

  • Steve Reinhardt - Simulator Infrastructure; Parallel Simulation; ISA description; Full System Simulation; Memory Modeling
  • Nate Binkert - Simulator Infrastructure; Parallel Simulation; Python Integration; Full System Simulation; Networking Models; Configuration Scripts
  • Ali Saidi - Networking Models; Device Modeling; Full System Simulation; Memory Modeling not including caches
  • Lisa Hsu - Full System Workloads; Memory Modeling; Checkpointing Simulations
  • Kevin Lim - CPU Modeling (Out-of-Order, SimpleCPU) ; Full-System Simulation;
  • Gabe Black - ISA description (SPARC, x86); Full System Simulation
  • Korey Sewell - ISA description (MIPS); Out-of-Order CPU Modeling; SMT, Syscall-Emulation Simulation
  • Ron Dreslinski - Memory Modeling