Main

ECE 576 - Homework Assignment 2

Due TBD via D2L

THIS ASSIGNMENT IS UNDER CONSTRUCTION. PLEASE CHECK BACK LATER FOR FINAL DETAILS


Announcements and Clarifications:

URGENT, February 19: As discussed in lecture, the bus implementation for this assignment should support both single read and write operation as well as burst read and write transaction.

Q: If a bus request's address is out of bounds, can we immediately stop simulation and report an error?

A: Although a *correct* implementation would ensure that all memory location are valid (i.e. all requested addresses map to some slave/servant component), when modeling the individual components within the system, you can't make this assumption. In other words, if this is intended to model a final system implementation, a bus request for an address that is not valid would be reported to the requesting device. Similarly, your implementation should report such errors to the master device, and the master device should check for this condition. If the master detects this condition, the mater device can provide a descriptive error message and stop simulation. However, this should only be done by the master device that made the invalid bus request.

Grading Rubric Updated


1. (25 points) System-level Design of Hardware/Software Partitioned Application.

Using SystemC and transaction-level modeling, implement the matrix multiplication application as a hardware/software partitioned design consisting of a software component (SW), hardware coprocessor (HW), and memory communicating over a shared bus (Bus). Using the profile information annotated within C code, determine which of the two innermost loops will result in the best increase in performance when partitioned to a hardware coprocessor.



#define LOOPS 1000
#define SIZE 5

int main() // Total Cycles: 8193437
{
  int n;
  int i,j,k;

  for (n = 0 ; n < LOOPS ; n++) // Total Cycles: 8186006, Execs: 1, Iters: 1000
  {
    for(i=1;i<=SIZE;i++) // Total Cycles: 579000, Execs: 1000, Iters: 5
      for(j=1;j<=SIZE;j++) // Total Cycles: 520000, Execs: 5000, Iters: 5
        c[i][j] = 0;

    for(i=1;i<=SIZE;i++) // Total Cycles: 7579000, Execs: 1000, Iters: 5
      for(j=1;j<=SIZE;j++) // Total Cycles: 7520000, Execs: 5000, Iters: 5
        for(k=1;k<=SIZE;k++) // Total Cycles: 7225000, Execs: 25000, Iters: 5
          c[i][j] += a[i][k] * b[k][j];
  }

  return 0;
}

Shared Bus

Your implementation should consist of two interfaces, a bus master interface (bus_master_if) and a bus servant interface (bus_servant_if) with support for directly modeling the various stages of the bus communication protocol. The names of all required transactions (or functions) for each interface are provided, but the exact parameters and return types are left open ended.

// Bus Master Interface
class bus_master_if : virtual public sc_interface
{
  public:
    virtual Request() = 0;
    virtual WaitForAcknowledge() = 0;
    virtual ReadData() = 0;
    virtual WriteData() = 0;
};


// Bus Servant Interface
class bus_servant_if : virtual public sc_interface
{
  public:
    virtual Listen() = 0;
    virtual Acknowledge() = 0;
    virtual SendReadData() = 0;
    virtual ReceiveWriteData() = 0;
};

For this assignment, your bus implementation should support both single data read and single data write operations and burst transactions. The shared bus component should support a round robin arbitration scheme.

The bus Request() should provide all information necessary to indicate what transaction is requested on the bus, similar to the two bus protocols discussed in lecture. During the Request(), sufficient information should be provided such that any servant component listening to the bus will receive the required information for responding (or determining if they should respond) to the request. For example, consider an implementation in which only one bus master is present. In this scenario, the bus master's Request() would need to include all information for the request such that the servants listening (i.e. waiting within Listen() function) to the bus can respond with the Acknowledge() if the request is for them. In this scenario, the acknowledge comes directly from the servant component as no arbiter is be present and only one master is waiting for the acknowledge.

The bus implementation should support burst transactions. However, the ReadData() and WriteData() transactions should read/write one data word during each invocation.

Thus, in a TLM implementation, all components acting as bus masters should not be aware of the arbitration method. Instead, all bus masters are provided the same protocol that hides the details of the arbitration such that the WaitForAcknowledge() will wait until an Acknowledge() has been made by a servant component in response to the bus master's request. While the bus arbitration is responsible for sending the acknowledge to correct master component, the acknowledge comes from the servant component.

The WaitForAcknowledge() called by a master component should not return until an Acknowledge() has been provided by a servant component in response to the current bus request. In other words, the bus component is not responsible for generating the acknowledge, but rather simply forwarding an acknowledge from a servant component.

All communication between the software and hardware components must take place using memory mapped addresses.

Software Component

The application software that is not partitioned to hardware can be directly modeled as C/C++ code within the software component. All array data (specifically arrays a, b, and c) are stored within the memory component and all reads from or writes to these arrays must be accessed through the shared bus.

Hardware Component

The hardware component can act as both a master and slave of the shared bus. Thus, the hardware component should have two ports: one connecting to the bus_master_if of the bus and one connecting to the bus_servant_if of the shared bus. All communication between the software and hardware components of the bus must be implemented using memory mapped addressing. you must also define the specific memory mapped addresses to which this component is mapped.

Memory

The arrays a, b, and c are located within a single memory component. You will need to determine both the minimum required size for the memory as well define the specific addresses at which the memory is located.

The a and b data values within the memory should be initialized at the beginning of simulation from the memory components constructor. The contents for these arrays is as follows:

a = { 0,0,0,0,0,0,0,0,9,4,7,9,0,12,14,15,16,11,0,2,3,4,5,6,0,4,3,2,1,2,0,2,7,6,4,9 };
b = { 0,0,0,0,0,0,0,0,9,4,7,9,0,12,14,15,16,11,0,2,3,4,5,6,0,4,3,2,1,2,0,2,7,6,4,9 };

2. (20 points) Performance Modeling of Software, Hardware, and Bus Communication.

Extend your system-level implementation to incorporate approximate time/cycle accurate performance data using the following information, assuming a system clock operating at 150 MHz.

  • A bus Request requires at least two clock cycle
  • A bus Acknowledge requires one clock cycle
  • A bus Write operation require one cycle
  • A bus Read operations require two cycles
  • The performance of the hardware component is only limited by the speed of the data transfers
  • All software instructions require 1.5 cycle (i.e. software execution delays can be estimated using the provided profile data)

The provided profile data will be needed to add the performance estimation information for the software component, specifically for estimating the performance of any loop(s) that are not partitioned to hardware. Please note that the provided profile is in cycles not instructions.

3. (5 points) Report

You must submit a Word or PDF document providing the following details regarding your implementation:

  • Provide complete details and a brief description of the bus_master_if and bus_servant_if utilized within your design.
  • Briefly provide and overview of how you integrated the performance modeling within your SystemC TLM implementation. Note: you may include code snippets in your report, but please keep these to only what is minimally necessary).
  • Provide a breakdown of the resulting hardware/software performance for the matrix multiplication application.

Submission Requirements:

You must submit your SystemC files and Report via D2L as a single ZIP or TAR/GZIPPED archive. Note: Do not submit executables, Makefiles, or Visual Studio project files.