Matrix multiplication is implemented using a systolic array architecture.
Every cycle feed packed weight data to Input pins and input data to Bidirectional pins. Strobe Enable pin to start receiving results of the matrix multiplication on the Output pins.
MCU is necessary to feed weights and input data into the accelerator and fetch the results.
# | Input | Output | Bidirectional |
---|---|---|---|
0 | 2nd FP4 weight LSB | result LSB | (in) activations LSB |
1 | 2nd FP4 weight | result | (in) activations |
2 | 2nd FP4 weight | result | (in) activations |
3 | 2nd FP4 weight MSB | result | (in) activations |
4 | 1st FP4 weight LSB | result | (in) activations |
5 | 1st FP4 weight | result | (in) activations |
6 | 1st FP4 weight | result | (in) activations |
7 | 1st FP4 weight MSB | result MSB | (in) activations MSB |