In computing, the multiply-accumulate (MAC) operation is a common function that calculates the product of two numbers and adds that product to an accumulator. This operation can be represented by the following equation:
a_{next} <= a + (b*c) \quad (1)
Modern computers often include a dedicated MAC unit, which consists of a multiplier implemented in combinational logic, followed by an adder and an accumulator register that stores the result. The output of the register is fed back into one input of the adder. As a result, on each clock cycle, the output of the multiplier is added to the register.
Inspired by Aleksandar Kostovic's Matrix-MAC Unit [1], we develop our own unit for future neural network applications, capable of solving a 2x2 matrix with 4 bits in each space.
Use a 10 MHz clock signal to iterate over 4-bit binary input values (i_I_cor) combined with 4-bit selection signals (select), which are defined as follows:
With this in mind, we will first send two matrices:
The expected behavior of TensorFlowE is to multiply two 2x2 matrices, resulting in the product C = A * B. If the enable_accu option is set, the result can be optionally accumulated. The clear signal is used to reset the accumulator. Additionally, the Ena_read bit reads the output in two parts: each part consists of 8 bits, which are divided into two 4-bit numbers. These numbers are then combined to reconstruct a 2x2 result matrix.
test_tensorflow_e:
Multiplies identity matrix A by matrix B and reads the result.
test_tensorflow_e2:
Multiplies scaled identity matrix A (factor 2) by matrix B and reads the result.
test_tensorflow_e3:
Pulses enable_accu (but then clears it) and then does two multiplications: first identity times B, then scaled identity (2) times B. Then reads the result.
test_tensorflow_e4:
Pulses enable_accu, then does two multiplications (identityB and scaled identityB), then pulses clear, then does a multiplication with a different A ([[2,0],[0,0]]) and B, then reads the result.
test_tensorflow_e5:
Pulses enable_accu, then does 6 multiplications (mostly identity and scaled identity with ones and scaled ones matrices). Then reads the result (which should be the accumulated result of all 6 multiplications if enable_accu was set during these operations? But note: the test pulses enable_accu only at the beginning and then sets it to 0. So unless the DUT latches the enable_accu signal, it might not accumulate. The exact behavior of the DUT is not clear from the test.)
test_tensorflow_e6:
Similar to test_tensorflow_e5, but after the 6 multiplications, it pulses clear and then does one more multiplication (with A=[[2,0],[0,0]] and B=[[4,1],[2,5]]). Then reads the result.
test_tensorflow_e7:
Similar to test_tensorflow_e6, but after the clear and one multiplication, it does an additional multiplication (with A=[[0,0],[0,1]] and B=[[4,1],[2,5]]). Then reads the result.
Personal Computer
[1] AleksandarKostovic. (2019). Matrix-MAC-Unit: Matrix Multiply and Accumulate unit written in System Verilog [Codice software]. GitHub. Recuperato il 8 settembre 2025, da https://github.com/AleksandarKostovic/Matrix-MAC-Unit
# | Input | Output | Bidirectional |
---|---|---|---|
0 | Datos_in_0 | Datos_out_0 | Ena_write |
1 | Datos_in_1 | Datos_out_1 | Ena_read |
2 | Datos_in_2 | Datos_out_2 | clear |
3 | Datos_in_3 | Datos_out_3 | enable_accu |
4 | Datos_in_4 | Datos_out_4 | Ena_out |
5 | Datos_in_5 | Datos_out_5 | |
6 | Datos_in_6 | Datos_out_6 | |
7 | Datos_in_7 | Datos_out_7 |