Reduced precision matrix multiplication base on systolic array architecture. Left side matrix is compressed to 2.6 bits per element.
Every cycle feed packed weight data to Input pins and input data to Bidirectional pins. Strobe Enable pin to start receiving results of the matrix multiplication on the Output pins.
External processor (RP2040 for example) is necessary to feed weights and input data into the accelerator and fetch the results.
# | Input | Output | Bidirectional |
---|---|---|---|
0 | packed weights LSB | result LSB | (in) activations LSB |
1 | packed weights | result | (in) activations |
2 | packed weights | result | (in) activations |
3 | packed weights | result | (in) activations |
4 | packed weights | result | (in) activations |
5 | packed weights | result | (in) activations |
6 | packed weights | result | (in) activations |
7 | packed weights MSB | result MSB | (in) activations MSB |