The AtomNPU (Neural Processing Unit) is a compact, 4-bit processing module designed to perform basic multiply-accumulate (MAC) operations, essential for neural network computations. This NPU efficiently processes input activations and weights to produce quantized output results.
Inputs:
input_data [3:0]
(ui_in[3:0]
): Represents the 4-bit activation vector input to the NPU.weight [3:0]
(uio_in[3:0]
): Represents the 4-bit weight vector applied to the activation vector.start
(uio_in[4]
): A control signal that initiates the MAC operation.Outputs:
output_data [3:0]
(uo_out[3:0]
): The 4-bit result of the multiply-accumulate operation.done
(uo_out[4]
): A status signal indicating the completion of the MAC operation.Control Signals:
clk
(Clock): Synchronizes the operations within the NPU.rst_n
(Reset): An active-low signal that resets the NPU to its initial state.Initialization (IDLE State):
rst_n
low), the NPU enters the IDLE state.0
.done
signal is deasserted (0
).Start Operation (CALC State):
start
signal is asserted (1
), the NPU transitions from IDLE to CALC.input_data
and weight
vectors are loaded into their respective registers.0
, and the bit counter is reset to 0
.Multiply-Accumulate Process:
weight
vector in a shift-add manner over 4 clock cycles (one for each bit).bit_count
from 0
to 3
):
weight
:
weight[0]
) is 1
, the input_data
is left-shifted by the current bit_count
and added to the accumulator.input_data
by the corresponding bit weight.weight
is shifted right by 1
bit to process the next bit in the subsequent cycle.Completion (DONE State):
15
(8'd15
), the output_data
is clamped to 15
to maintain the 4-bit width.output_data
.done
signal is asserted (1
) to indicate the operation's completion.To prevent overflow and ensure the output remains within the 4-bit constraint, the NPU incorporates a clamping mechanism:
15
.output_data
is set to 15
(4'd15
).output_data
reflects the accumulator's value.Below is a step-by-step guide to facilitate thorough testing.
Initialization:
rst_n
low) to initialize the NPU.Setting Inputs:
input_data [3:0]
):
ui_in[3:0]
to set the 4-bit activation vector.weight [3:0]
):
uio_in[3:0]
to set the 4-bit weight vector.Initiating Operation:
start
button (connected to uio_in[4]
) to begin the MAC operation.start
signal is internally connected to initiate the NPU's state machine.Observing Outputs:
output_data [3:0]
(uo_out[3:0]
):
uo_out[3:0]
to view the resulting 4-bit output.done
Signal (uo_out[4]
):
uo_out[4]
will illuminate (1
) once the operation is complete.List external hardware used in your project (e.g. PMOD, LED display, etc), if any
# | Input | Output | Bidirectional |
---|---|---|---|
0 | input_data[0] | output_data[0] | weight[0] |
1 | input_data[1] | output_data[1] | weight[1] |
2 | input_data[2] | output_data[2] | weight[2] |
3 | input_data[3] | output_data[3] | weight[3] |
4 | |||
5 | |||
6 | |||
7 |