
A 4x4 matrix multiply accelerator controlled via SPI:
A matrix elements are unsigned 8-bit.B matrix elements are ternary weights encoded in 2 bits (00=0, 01=+1, 10=-1, 11 treated as 0).PE=2) process two output elements at once.Latency for one full 4x4 result is 32 compute cycles ((16 outputs / 2 lanes) * 4 k-steps), not including SPI transfer overhead.
The SPI command byte format is: {R/W[7], SEL[6:5], ROW[4:3], COL[2:1], 0}
[1:0] are used as ternary code){6'b0, done, busy}.The accumulator is 20 bits wide. For ternary weights, the dot-product range per element is [-1020, +1020].
Connect an SPI master (e.g. microcontroller or RP2040 on the TT demo board) to the SPI pins:
00=0, 01=+1, 10=-1).SPI master (directly from the RP2040 on the TT demo board, or any external microcontroller).
| # | Input | Output | Bidirectional |
|---|---|---|---|
| 0 | spi_sclk | spi_miso | |
| 1 | spi_cs_n | busy | |
| 2 | spi_mosi | done | |
| 3 | |||
| 4 | |||
| 5 | |||
| 6 | |||
| 7 |