Memory API

Kernel Memory Interface

We offer various memory controller components (various load-store units, arbiters, and a cache) to facilitate communication with the main memory. We suggest splitting the accelerator unit into the computational kernel and the memory network as shown here. We provide high-level interfaces to memory in the compute kernel via fifo-like ports. The ports take a certain payload type that carries information such as an address of memory to access, burst count, tags. We strongly recommend using the Python API to facilitate creation of memory ports and automatic generation of code for kernel integration with memory subsystem.

 ga::tlm_fifo_out<MemTypedReadReqType<MyType> > req_out; 
 ga::tlm_fifo_in<MemTypedReadRespType<MyType> > resp_in;

Here, ga::tlm_fifo_out is an output port that is used for requests to be sent out. ga::tlm_fifo_in is used to receive data from the memory load-store units. MemTypedReadReqType<MyType>, MemTypedReadRespType<MyType> are payloads for the ports. The payloads often take application type (as MyType in example).

Here's an example how the ports can be used in a kernel code:

 // request 10 elements of size MyType at address addr_offset
 req_out.put(MemTypedReadReqType<MyType>(addr_offset, 10);
 // receive 10 corresponding responses with values of MyType
 for (size_t i = 0; i < 10; ++i)  
     MyType next_value = resp_in.get().data;
 }

We provide several memory interface types, each of them provide a specialized memory controller that is optimized for a particular memory access type. For example, the memory components are differently optimized for streaming access vs. random access. There are memory only-read ports, only-write ports and a read/write port. Most memory interfaces come in port bundles, e.g. for read port in example above, one request and one response ports are used.

Memory Read Port Bundles

Streaming Read Port Bundle. The interface can be used for both streaming and random read access to memory.

ga::tlm_fifo_out<MemTypedReadReqType<USER_TYPE> > req_out; ga::tlm_fifo_in<MemTypedReadRespType<USER_TYPE> > resp_in;

The user constructs  a `MemTypedReadReqType<USER_TYPE>` object to request N (size) elements of type `USER_TYPE` at some address (addr). Multiple requests can be done by sending multiple `MemTypedReadReqType<USER_TYPE>` objects to req_out port.
`MemTypedReadReqType<USER_TYPE>` has the following fields:
* `addr` - memory address
* `size` - burst size
 
The responses will come in the same order as corresponding requests, one element of type `USER_TYPE` at a time:
`MemTypedReadRespType<USER_TYPE>` has the following fields: 
* `data` - data of `USER_TYPE`. 

*Note: if the application requires both read and write accesses to the same location (or cacheline) in memory, then the read/write interface should be used instead because the order between read and write requests cannot be guaranteed using this interface*

2. **Random Read Port Bundle.** This read interface is used for random accesses and does not support burst/streaming mode. This interface may have better performance for random accesses than the streaming interface because it does not reorder responses. Because of that, a user needs to provide a unique tag that can be used to match a response to a request. 
```cpp
ga::tlm_fifo_out<MemSingleReadReqType<USER_TYPE, USER_TAG_TYPE> > req_out; 
ga::tlm_fifo_in<MemSingleReadRespType<USER_TYPE, USER_TAG_TYPE> > resp_in;

MemSingleReadReqType<USER_TYPE, USER_TAG_TYPE> has the following fields:

addr - memory address
utag - tag value that will return with the corresponding response

MemSingleReadRespType<USER_TYPE, USER_TAG_TYPE> has the following fields:

data - data of USER_TYPE
utag - tag value that identifies the corresponding request

Note: if the application requires both read and write accesses to the same location (or cacheline) in memory, then the read/write interface should be used instead because the order between read and write requests cannot be guaranteed using this interface

Memory Write Port Bundles

Streaming Write Port Bundle. This write interface is used for streaming writes to memory. It comes as a two output port bundle.

ga::tlm_fifo_out<MemTypedWriteReqType<USER_TYPE> > req_out; ga::tlm_fifo_out<MemTypedWriteDataType<USER_TYPE> > data_out;


Port `req_out` is used to send the address and the number of elements to be written. Port `data_out` is used to send the streaming data - one element at a time:

`MemTypedWriteReqType<USER_TYPE>` has the following fields:
  - `addr` - the base address of the write request
  - `size` - number of elements to be written

`MemTypedWriteDataType<USER_TYPE>` has the following fields:
  - `data` - the next element to be written 

2. **Random Write Port.** This write port is designed for random write requests and supports a partial write. `MemSingleWriteReqType<USER_TYPE,USER_TAG_TYPE>` has the following fields:
  - `addr` - the base address of the write request
  - `data` -  data of `USER_TYPE`

  *Note: if the application requires both read and write accesses to the same location (or cacheline) in memory, then the read/write interface should be used instead because the order between read and write requests cannot be guaranteed using this interface*

## Memory Read/Write Port Bundle
1. **Random Read/Write Port Bundle.** This is the memory interface that can support read and write that occur to the same cache line or the same address during execution. It also supports partial writes to the cacheline.

 ```cpp
ga::tlm_fifo_in<MemSingleReadReqType<T,UTAG> > rd_req_in;
ga::tlm_fifo_out<MemSingleReadRespType<T,UTAG> > rd_resp_out;
ga::tlm_fifo_in<MemSingleWriteReqType<T,UTAG> > wr_req_in;

Load Store Units Declaration and Parameters

Load-store units are SystemC modules that service the high-level API to access main memory for your kernel described above. These modules will be connected to the corresponding memory port bundles at the integration stage. One application can use multiple ports and different ports may have different internal parameters that will affect performance, area and power of the accelerator. Below, we provide a list of load/store units and their parameters that the user will have to select.

Streaming Access Load Unit (used to service Streaming Read Port Bundle):

AccIn<LoadUnitParams<USER_TYPE, BUF_SIZE_IN_CL, MAX_BURST_COUNT, BUF_SIZE_IN_BURSTD_REQS> > - the load unit is used for both streaming and random accesses and support burst requests. The responses come in order of requests.
Random Access Load Unit (used to service Random Read Port Bundle)

AccIn<LoadUnitSingleReqParams<USER_TYPE, USER_TAG_TYPE, BUF_SIZE_IN_CL> > - the load unit that should be used for random accesses. It supports only non-burst requests. If multiple requests are made, the responses may come out of order and a tag is used to match a response to a request. The tag used with the request will arrive with the corresponding response. This load unit is more efficient for random access both in terms of performance and area than the burst one
Streaming Access Store Unit (used to service Streaming Write Port Bundle)

AccOut<StoreUnitParams<USER_TYPE> > - the store unit can be used for both streaming and random writes to memory. In the streaming case, the store unit packs multiple elements of USER_TYPE to the same cache line and only then sends a write request to lower-level memory. If used for random writes, caution has to be made as this unit does not support partial writes into a cache line and corruption of the cache line may happen if USER_TYPE is not of the same size as a cache line.
Random Access Load-Store Unit (used to service Random Read/Write Port Bundle)

AccInOut<LoadStoreUnitSingleReqParams<USER_TYPE, USER_TAG_TYPE, NUM_OF_SETS, ASSOCIATIVITY, MSHR_BUF_SIZE, SAME_MISS_REQ_BUF_SIZE> > - the load-store unit that should be used for random accesses for both reads and writes. It supports partial write to a cache line. In addition, it provides caching capabilities.

The following are the parameters used with the load-store units:

USER_TYPE - the type of data being requested from/to memory. The type is defined by the user and is a C structure that has to abide to several implementation rules (see "Preparing data types for memory communication section")
BUF_SIZE_IN_CL - maximum number of cache lines requested in flight (usually large for random accesses)
MAX_BURST_COUNT - maximum number of elements requested per one request (e.g. may be 1 for random accesses or very large value for streaming accesses)
BUF_SIZE_IN_BURSTD_REQS (by default is assigned to value of BUF_SIZE_IN_CL) - maximum number of user requests in flight (should be not assigned for random acceses, but can be 1 for streaming applications when you request a large chunk of data but only once)
USER_TAG_TYPE - the tag type used with a tag value that user provides with a request and that is returned with a response to help the user match response to request as the responses for memory units with tags can come out of order.
NUM_OF_SETS, ASSOCIATIVITY - used for the load store unit with a cache and defines primary cache characteristics (cache size is NUM_OF_SETS * ASSOCIATIVITY * 64B)
MSHR_BUF_SIZE - used for the load store unit with a cache and defines the maximum number of outstanding miss requests to memory in flight
SAME_MISS_REQ_BUF_SIZE - used for the load store unit with a cache and defines the maximum number of requests to the same cache line during miss handling of that cache line before the cache will stall

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory API

Kernel Memory Interface

Memory Read Port Bundles

Memory Write Port Bundles

Load Store Units Declaration and Parameters

Clone this wiki locally