
The AtomNPU (Neural Processing Unit) is a compact, 4-bit processing module designed to perform basic multiply-accumulate (MAC) operations, essential for neural network computations. This NPU efficiently processes input activations and weights to produce quantized output results.
Inputs:
input_data [3:0] (ui_in[3:0]): Represents the 4-bit activation vector input to the NPU.weight [3:0] (uio_in[3:0]): Represents the 4-bit weight vector applied to the activation vector.start (uio_in[4]): A control signal that initiates the MAC operation.Outputs:
output_data [3:0] (uo_out[3:0]): The 4-bit result of the multiply-accumulate operation.done (uo_out[4]): A status signal indicating the completion of the MAC operation.Control Signals:
clk (Clock): Synchronizes the operations within the NPU.rst_n (Reset): An active-low signal that resets the NPU to its initial state.Initialization (IDLE State):
rst_n low), the NPU enters the IDLE state.0.done signal is deasserted (0).Start Operation (CALC State):
start signal is asserted (1), the NPU transitions from IDLE to CALC.input_data and weight vectors are loaded into their respective registers.0, and the bit counter is reset to 0.Multiply-Accumulate Process:
weight vector in a shift-add manner over 4 clock cycles (one for each bit).bit_count from 0 to 3):
weight:
weight[0]) is 1, the input_data is left-shifted by the current bit_count and added to the accumulator.input_data by the corresponding bit weight.weight is shifted right by 1 bit to process the next bit in the subsequent cycle.Completion (DONE State):
15 (8'd15), the output_data is clamped to 15 to maintain the 4-bit width.output_data.done signal is asserted (1) to indicate the operation's completion.To prevent overflow and ensure the output remains within the 4-bit constraint, the NPU incorporates a clamping mechanism:
15.output_data is set to 15 (4'd15).output_data reflects the accumulator's value.Below is a step-by-step guide to facilitate thorough testing.
Initialization:
rst_n low) to initialize the NPU.Setting Inputs:
input_data [3:0]):
ui_in[3:0] to set the 4-bit activation vector.weight [3:0]):
uio_in[3:0] to set the 4-bit weight vector.Initiating Operation:
start button (connected to uio_in[4]) to begin the MAC operation.start signal is internally connected to initiate the NPU's state machine.Observing Outputs:
output_data [3:0] (uo_out[3:0]):
uo_out[3:0] to view the resulting 4-bit output.done Signal (uo_out[4]):
uo_out[4] will illuminate (1) once the operation is complete.List external hardware used in your project (e.g. PMOD, LED display, etc), if any
| # | Input | Output | Bidirectional |
|---|---|---|---|
| 0 | input_data[0] | output_data[0] | weight[0] |
| 1 | input_data[1] | output_data[1] | weight[1] |
| 2 | input_data[2] | output_data[2] | weight[2] |
| 3 | input_data[3] | output_data[3] | weight[3] |
| 4 | |||
| 5 | |||
| 6 | |||
| 7 |