7 2x2 Systolic Array Matrix Multiplier

7 : 2x2 Systolic Array Matrix Multiplier

Design render

How it works

This project implements a 2×2 output-stationary systolic array that computes signed 4-bit matrix multiplication, producing 8-bit results.

The design has 5 modules:

  • PE (Processing Element) — bit-serial multiplier with accumulator. Each PE computes acc += a_in × weight_in every cycle when en=1.
  • array_module — 2×2 grid of 4 PEs with input skew registers. a_row1 and weight_col1 are delayed by 1 cycle to align data correctly across the array.
  • FSM — controls the computation sequence: IDLE → COMPUTE (2 cycles) → DRAIN (2 cycles) → READ (4 cycles) → IDLE. Generates en, clear, and valid signals automatically after start is pulsed.
  • output_serializer — serializes the 4 accumulator results (acc_00, acc_01, acc_10, acc_11) onto data_out one per cycle when valid=1.
  • tt_um_systolic_mac_2x2 — TT wrapper with streaming counter. Loads two sets of input registers (cycle 1 and cycle 2 values) via ui_in and uio_in, then auto-streams them to the array after start.

Input loading protocol

Load all 4 register pairs in 4 cycles using load=1 and sel:

sel ui_in[7:4] (data_c1) uio_in[7:4] (data_c2)
00 A[0][0] A[0][1]
01 A[1][0] A[1][1]
10 B[0][0] B[1][0]
11 B[0][1] B[1][1]

After loading, pulse start=1 for 1 cycle. The wrapper streams cycle 1 values, then cycle 2 values, then zeros automatically. Results appear on uo_out one per cycle when uio_out[0] (valid) goes high.

Matrix multiply

Computes C = A × B where:

  • A and B are 2×2 matrices of signed 4-bit integers (-8 to 7)
  • C elements are signed 8-bit integers (-128 to 127)
  • Output order: C[0][0], C[0][1], C[1][0], C[1][1]

How to test

  1. Reset the design (rst_n=0 for 5 cycles, then rst_n=1)
  2. Load matrix A and B values using load=1, sel, ui_in[7:4], uio_in[7:4]
  3. Pulse start=1 for 1 cycle
  4. Wait for uio_out[0] (valid) to go high
  5. Read 4 consecutive values from uo_out

Example — identity matrix test

IO

#InputOutputBidirectional
0startdata_out[0]valid
1loaddata_out[1]
2sel[0]data_out[2]
3sel[1]data_out[3]
4data_c1[0]data_out[4]data_c2[0]
5data_c1[1]data_out[5]data_c2[1]
6data_c1[2]data_out[6]data_c2[2]
7data_c1[3]data_out[7]data_c2[3]

Chip location

Controller Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux tt_um_chip_rom (Chip ROM) tt_um_factory_test (Tiny Tapeout Factory Test) tt_um_utoss_riscv (UTOSS RISC-V core) tt_um_memory_game_top (Number Memory Game) tt_um_danielpenas42 (Ball Display) tt_um_machinelearning (7-Segment Neural Predictor) tt_um_microlane_demo (microlane demo project) tt_um_pixel_processor (Tiny Pixel Processor) tt_um_jpigdon_gps_accelerator_top (GPS_Accelerator) tt_um_rgb_mixer (rgb_mixer) tt_um_bgao43 (Tiny TPU Systolic Array) tt_um_main (Pong in Verilog) tt_um_joannec34_teenytpu (teenytpu) tt_um_apa102_ws2812_squidgeefish (APA102 to WS2812 Translator) tt_um_uacj_bouncing_DVD_screensaver (Custom DVD Screensaver for VGA) tt_um_logoUACJ_MOGA (VGA_screensaver_UACJ) tt_um_grace_spi_led_driver (SPI-Controlled 8-Channel LED Driver) tt_um_rebeccargb_universal_decoder (Universal Binary to Segment Decoder) tt_um_rebeccargb_hardware_utf8 (Hardware UTF Encoder/Decoder) tt_um_happyhop_deadcast2 (happyhop) tt_um_dino7 (Dino-7: 7-Segment Runner Game) tt_um_arty3_mac_engine (Simple MAC Engine w/ Postproc) tt_um_uacj (Custom DVD Screensaver for VGA) tt_um_algofoogle_dottee (DOTTEE VGA demo (TTGF26a)) tt_um_mattvenn_signal_generator (Simple Signal Generator) tt_um_urish_simon (Simon Says memory game) tt_um_tpu (Tensor Processing Unit For GF) tt_um_gojimmypi_ttgf_UART_FSM_TRNG_Lab (Hardware Entropy Explorer: UART/SPI TRNG and PUF) tt_um_wokwi_465483277165299713 (First Tinytapeout) tt_um_prem_pipeline_test (Programmable_Pipeline-RISC-V) tt_um_wokwi_467219410242853889 (Tiny Tapeout testtest 111233) tt_um_wokwi_465549494272929793 (Pacos first design) tt_um_wokwi_465731371445677057 (Arturo's first Wokwi design) tt_um_wokwi_465732744934845441 (Tiny Tapeout Template_1234) tt_um_wokwi_465736492859711489 (Tiny Tapeout Workshop JuanF) tt_um_wokwi_465731430225727489 (Rafa’s first Wokwi design) tt_um_wokwi_465731458365332481 (7 segment Display Fli-Flop Try-out) tt_um_wokwi_465732744245929985 (DiseñoCursoTiny) tt_um_wokwi_465731490568160257 (Matt’s first Wokwi design) tt_um_wokwi_465736691688630273 (test1) tt_um_wokwi_465731458628527105 (Mi copia del Tiny Tapeout) tt_um_wokwi_465731520738845697 (El primer diseño) tt_um_wokwi_465731521356457985 (Tiny Tapeout Template Copy) tt_um_gen1_digital_companion_tile (Gen1 Digital Companion Tile) tt_um_wokwi_465732827753495553 (Tiny Tapeout Template Ayman) tt_um_wokwi_465731394728267777 (Julian_Proyecto) tt_um_wokwi_465731458535202817 (Tiny Tapeout Template Copy) tt_um_wokwi_465732847401723905 (Basic Circuit) tt_um_wokwi_465731452481768449 (El primer diseño de Matt para Wokwi) tt_um_wokwi_465731502018614273 (Tiny Tapeout Template flip flop) tt_um_wokwi_465732616714924033 (Tiny Tapeout RJAP) tt_um_wokwi_465731575275296769 (ocxpkeWokwiDesign) tt_um_wokwi_465732880722332673 (Pedro Template) tt_um_wokwi_465731858252480513 (Paula's first Wokwi design) tt_um_wokwi_465731455677830145 (Tiny Tapeout JMCG) tt_um_wokwi_465737601403996161 (Tiny Number Simon) tt_um_ttmul (Balanced Ternary Multiplier) tt_um_wokwi_465731466664816641 (Tiny Tapeout Workshop Malaga 2jun2026) tt_um_8bit_risc_cpu (8-bit RISC CPU) tt_um_wokwi_451184391728659457 (Simple Sprinkler) tt_um_fhw_appel_spiPWMio (spiPWMio) tt_um_divadnauj_GB_serv_soc_wb (serv_soc_wb) tt_um_8bitcustomcomputer (SAP 8 Bit Computer) tt_um_bioimpedance (Very Low Resource Digital Implementation of Bioimpedance Analysis) tt_um_mgj_bist8 (BIST-8: Built-In Self-Test for 8-bit CLA Adder) tt_um_roberto_tiny_radar_tile (BioPulse Tile) tt_um_systolic_mac_2x2 (2x2 Systolic Array Matrix Multiplier) tt_um_peg_top (2x2 CNN Accelerator PE Grid with UART) tt_um_AlvaroRub_ringcounter (Counter16Outputs) tt_um_wokwi_465731440267947009 (Antonio's first Wokwi design) tt_um_wokwi_465732706576877569 (Guille's first Wokwi design.) tt_um_wokwi_465731481873367041 (MIPS-Lite 8-bit Processor) tt_um_wokwi_465736612213902337 (Juan`s first Worki design) tt_um_wokwi_465731439156454401 (Rhyloo’s first Wokwi design) tt_um_wokwi_465732536551273473 (Tiny Tapeout Marcos Fernandez) tt_um_wokwi_465737290543084545 (Tiny Tapeout Template) tt_um_wokwi_465630130495825921 (ram 1 bit Copy) tt_um_wokwi_465731403724006401 (sdft wokwi 1) tt_um_top (RHD2164-MCU-SPI Bridge) tt_um_line_follower_arvaloez (Line Follower Robot controller) tt_um_xoroshiro64plus_v2 (xoroshiro64) tt_um_ohuettenhofer_tiny_qsim (Tiny Quantum Circuit Simulator) tt_um_santhosh_ring_osc_gf (Ring Oscillator PVT Sensor & TRNG (GF180)) tt_um_santhosh_stoch_stdp_pair_gf (Stochastic neuron + STDP controller (merged, GF180)) tt_um_santhosh_rsd_char_gf (RRAM Characterization Platform (DC sweep + endurance + retention + histogram, GF180)) tt_um_santhosh_xbar_ctrl_gf (Memristive Crossbar Peripheral Controller (GF180)) tt_um_joseph_bf (BF) tt_um_hydrocomms (FSK Modem) tt_um_systolic_array (2x2 MAC Systolic array with DFT) tt_um_kluterirv_rv32e_core (Minimal RV32E SoC with UART Loader) tt_um_algofoogle_ttgf26a_vco (VCO driven by DAC) tt_um_fer_logo_music_vga (UNIZG-FER VGA project) tt_um_maqsudbek_dyadic_pwm (Dyadic PWM) tt_um_waferspace_vga_screensaver (Wafer.space Logo VGA Screensaver) tt_um_htfab_vga_tester (Video mode tester)