
This project implements a small-scale Tensor Processing Unit (TPU) that performs 2x2 matrix multiplications using a systolic array of Multiply-Accumulate (MAC) units. It is designed for the Tiny Tapeout ASIC flow.
uio_in[1]).uio_in[2]).| Pin | Direction | Signal |
|---|---|---|
ui_in[7:0] |
Input | 8-bit matrix data |
uo_out[7:0] |
Output | 8-bit result data (half of 16-bit element) |
uio_in[0] |
Input | LOAD_EN: enable loading elements into memory |
uio_in[1] |
Input | TRANSPOSE: enable fused transpose of second operand |
uio_in[2] |
Input | ACTIVATION: enable ReLU activation on output |
uio_out[5] |
Output | STATE0: FSM state bit 0 |
uio_out[6] |
Output | STATE1: FSM state bit 1 |
uio_out[7] |
Output | DONE: high when results are ready |
rst_n low for several clock cycles, then release.uio_in[0] (LOAD_EN) high and send 8 bytes via ui_in one per clock cycle: 4 weight elements followed by 4 input elements (row-major order).DONE signal (uio_out[7]) goes high when results are available.uo_out (high byte first). 8 clock cycles to read all 4 elements.cd test
pip install -r requirements.txt
make
This runs cocotb tests via Icarus Verilog covering identity multiply, general multiply, signed values, and ReLU activation.
No external hardware required for basic operation. Connect ui_in and uio_in to a microcontroller or FPGA for data input and control. Connect uo_out and uio_out to read results and status.
| # | Input | Output | Bidirectional |
|---|---|---|---|
| 0 | DATA[0] matrix element bit 0 | OUT[0] result data bit 0 | LOAD_EN (input) enable matrix element loading |
| 1 | DATA[1] matrix element bit 1 | OUT[1] result data bit 1 | TRANSPOSE (input) fused transpose of second operand |
| 2 | DATA[2] matrix element bit 2 | OUT[2] result data bit 2 | ACTIVATION (input) enable ReLU on output |
| 3 | DATA[3] matrix element bit 3 | OUT[3] result data bit 3 | (unused) |
| 4 | DATA[4] matrix element bit 4 | OUT[4] result data bit 4 | (unused) |
| 5 | DATA[5] matrix element bit 5 | OUT[5] result data bit 5 | STATE0 (output) FSM state bit 0 |
| 6 | DATA[6] matrix element bit 6 | OUT[6] result data bit 6 | STATE1 (output) FSM state bit 1 |
| 7 | DATA[7] matrix element bit 7 | OUT[7] result data bit 7 | DONE (output) high when results are ready |