
This is a programmable sequential Hardware Neural Network consisting of a single layer with 3 neurons, each processing 3 inputs, built using a shared Multiply-Accumulate (MAC) unit to fit within a single Tiny Tapeout tile.
The design allows loading custom Weights, Biases, and Inputs at runtime. It then sequentially multiplies the input features by the programmed 3x3 weight matrix, accumulates the values along with the programmed biases, and finally passes them through a ReLU activation function. Because all calculations share a single MAC engine, it functions exactly like a Tiny Tensor Processing Unit (TPU).
The design exposes a multiplexed memory-mapped interface using the uio_in pins. Memory addresses (0-15) are routed via uio_in[3:0], write enable is on uio_in[4], and read enable is on uio_in[5].
To write to the internal memory, you must set uio_in[4] (io_write) to 1, supply data on ui_in, and select the memory address on uio_in[3:0]:
X[0], X[1], X[2]W[0..8] (Row major format: W[Neuron_i][Input_j])B[0], B[1], B[2]Clock the design to latch each value into its respective register.
uio_in[4] (io_write) to 0.uio_in[3:0] (address) to 15 (4'b1111) to trigger the calculation sequence.ui_in[1:0] to configure the Programmable PReLU activation function:
00: Standard ReLU01: $x/2$10: $x/4$11: $x/8$STATE_IDLE state. The FSM is highly optimized, computing 1 inference for a neuron every 3 clock cycles by overlapping the PReLU activation combinationally with the final Multiply-Accumulate step (zero-cycle activation penalty).uio_in[5] (io_read) to 1.uio_in[3:0] to 0, 1, or 2 to map the corresponding network output Y[0], Y[1], or Y[2] onto the uo_out pins.None required. The testbench or an external microcontroller can sequentially write the parameters across the bus and trigger calculations.
| # | Input | Output | Bidirectional |
|---|---|---|---|
| 0 | Data In[0] | Data Out[0] | io_addr[0] |
| 1 | Data In[1] | Data Out[1] | io_addr[1] |
| 2 | Data In[2] | Data Out[2] | io_addr[2] |
| 3 | Data In[3] | Data Out[3] | io_addr[3] |
| 4 | Data In[4] | Data Out[4] | io_write (1=Write) |
| 5 | Data In[5] | Data Out[5] | io_read (1=Read) |
| 6 | Data In[6] | Data Out[6] | |
| 7 | Data In[7] | Data Out[7] |