388 Tiny TPU Systolic Array

388 : Tiny TPU Systolic Array

Design render
  • Author: Yu Xia,Bohong Gao, Junjie Yan, Yufeng Wang, Jinyao Sun, Arthur Stanson
  • Description: A small systolic-array-based matrix multiplication accelerator
  • GitHub repository
  • Open in 3D viewer
  • Clock: 100000000 Hz

Tiny TPU Systolic Array

This project implements a small TPU-style matrix multiplication accelerator using a systolic array architecture. The design is intended for TinyTapeout and uses a simple SPI-style serial interface to configure the design, load data, start computation, and read results.

How it works

The design is built around a small systolic array made of processing elements. Each processing element performs multiply-accumulate operations and passes data through the array. The controller receives commands through the SPI interface, manages configuration and execution, and sends results back through the serial output.

The TinyTapeout wrapper maps the external signals as follows:

  • ui_in[0]: SCLK
  • ui_in[1]: MOSI
  • ui_in[2]: CS_N
  • uo_out[0]: MISO

The system clock and reset use the standard TinyTapeout clk and rst_n pins.

How to test

The project can be tested using the TinyTapeout cocotb testbench flow. The testbench drives the TinyTapeout wrapper interface, applies reset, sends SPI command sequences through ui_in, and observes the serial output through uo_out[0].

To run the test locally:

cd test
make -B

The GitHub Actions workflow also runs the RTL test automatically after pushing changes to the repository.

External hardware

No external hardware is required. The design only uses the TinyTapeout digital input, output, clock, and reset pins.

IO

#InputOutputBidirectional
0SCLKMISO
1MOSI
2CS_N
3
4
5
6
7

Chip location

Controller Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux tt_um_chip_rom (Chip ROM) tt_um_factory_test (Tiny Tapeout Factory Test) tt_um_utoss_riscv (UTOSS RISC-V core) tt_um_memory_game_top (Number Memory Game) tt_um_danielpenas42 (Ball Display) tt_um_machinelearning (7-Segment Neural Predictor) tt_um_microlane_demo (microlane demo project) tt_um_pixel_processor (Tiny Pixel Processor) tt_um_jpigdon_gps_accelerator_top (GPS_Accelerator) tt_um_rgb_mixer (rgb_mixer) tt_um_bgao43 (Tiny TPU Systolic Array) tt_um_main (Pong in Verilog) tt_um_joannec34_teenytpu (teenytpu) tt_um_apa102_ws2812_squidgeefish (APA102 to WS2812 Translator) tt_um_uacj_bouncing_DVD_screensaver (Custom DVD Screensaver for VGA) tt_um_logoUACJ_MOGA (VGA_screensaver_UACJ) tt_um_grace_spi_led_driver (SPI-Controlled 8-Channel LED Driver) tt_um_rebeccargb_universal_decoder (Universal Binary to Segment Decoder) tt_um_rebeccargb_hardware_utf8 (Hardware UTF Encoder/Decoder) tt_um_happyhop_deadcast2 (happyhop) tt_um_dino7 (Dino-7: 7-Segment Runner Game) tt_um_arty3_mac_engine (Simple MAC Engine w/ Postproc) tt_um_uacj (Custom DVD Screensaver for VGA) tt_um_algofoogle_dottee (DOTTEE VGA demo (TTGF26a)) tt_um_mattvenn_signal_generator (Simple Signal Generator) tt_um_urish_simon (Simon Says memory game) tt_um_tpu (Tensor Processing Unit For GF) tt_um_gojimmypi_ttgf_UART_FSM_TRNG_Lab (Hardware Entropy Explorer: UART/SPI TRNG and PUF) tt_um_wokwi_465483277165299713 (First Tinytapeout) tt_um_prem_pipeline_test (Programmable_Pipeline-RISC-V) tt_um_wokwi_467219410242853889 (Tiny Tapeout testtest 111233) tt_um_wokwi_465549494272929793 (Pacos first design) tt_um_wokwi_465731371445677057 (Arturo's first Wokwi design) tt_um_wokwi_465732744934845441 (Tiny Tapeout Template_1234) tt_um_wokwi_465736492859711489 (Tiny Tapeout Workshop JuanF) tt_um_wokwi_465731430225727489 (Rafa’s first Wokwi design) tt_um_wokwi_465731458365332481 (7 segment Display Fli-Flop Try-out) tt_um_wokwi_465732744245929985 (DiseñoCursoTiny) tt_um_wokwi_465731490568160257 (Matt’s first Wokwi design) tt_um_wokwi_465736691688630273 (test1) tt_um_wokwi_465731458628527105 (Mi copia del Tiny Tapeout) tt_um_wokwi_465731520738845697 (El primer diseño) tt_um_wokwi_465731521356457985 (Tiny Tapeout Template Copy) tt_um_gen1_digital_companion_tile (Gen1 Digital Companion Tile) tt_um_wokwi_465732827753495553 (Tiny Tapeout Template Ayman) tt_um_wokwi_465731394728267777 (Julian_Proyecto) tt_um_wokwi_465731458535202817 (Tiny Tapeout Template Copy) tt_um_wokwi_465732847401723905 (Basic Circuit) tt_um_wokwi_465731452481768449 (El primer diseño de Matt para Wokwi) tt_um_wokwi_465731502018614273 (Tiny Tapeout Template flip flop) tt_um_wokwi_465732616714924033 (Tiny Tapeout RJAP) tt_um_wokwi_465731575275296769 (ocxpkeWokwiDesign) tt_um_wokwi_465732880722332673 (Pedro Template) tt_um_wokwi_465731858252480513 (Paula's first Wokwi design) tt_um_wokwi_465731455677830145 (Tiny Tapeout JMCG) tt_um_wokwi_465737601403996161 (Tiny Number Simon) tt_um_ttmul (Balanced Ternary Multiplier) tt_um_wokwi_465731466664816641 (Tiny Tapeout Workshop Malaga 2jun2026) tt_um_8bit_risc_cpu (8-bit RISC CPU) tt_um_wokwi_451184391728659457 (Simple Sprinkler) tt_um_fhw_appel_spiPWMio (spiPWMio) tt_um_divadnauj_GB_serv_soc_wb (serv_soc_wb) tt_um_8bitcustomcomputer (SAP 8 Bit Computer) tt_um_bioimpedance (Very Low Resource Digital Implementation of Bioimpedance Analysis) tt_um_mgj_bist8 (BIST-8: Built-In Self-Test for 8-bit CLA Adder) tt_um_roberto_tiny_radar_tile (BioPulse Tile) tt_um_systolic_mac_2x2 (2x2 Systolic Array Matrix Multiplier) tt_um_peg_top (2x2 CNN Accelerator PE Grid with UART) tt_um_AlvaroRub_ringcounter (Counter16Outputs) tt_um_wokwi_465731440267947009 (Antonio's first Wokwi design) tt_um_wokwi_465732706576877569 (Guille's first Wokwi design.) tt_um_wokwi_465731481873367041 (MIPS-Lite 8-bit Processor) tt_um_wokwi_465736612213902337 (Juan`s first Worki design) tt_um_wokwi_465731439156454401 (Rhyloo’s first Wokwi design) tt_um_wokwi_465732536551273473 (Tiny Tapeout Marcos Fernandez) tt_um_wokwi_465737290543084545 (Tiny Tapeout Template) tt_um_wokwi_465630130495825921 (ram 1 bit Copy) tt_um_wokwi_465731403724006401 (sdft wokwi 1) tt_um_top (RHD2164-MCU-SPI Bridge) tt_um_line_follower_arvaloez (Line Follower Robot controller) tt_um_xoroshiro64plus_v2 (xoroshiro64) tt_um_ohuettenhofer_tiny_qsim (Tiny Quantum Circuit Simulator) tt_um_santhosh_ring_osc_gf (Ring Oscillator PVT Sensor & TRNG (GF180)) tt_um_santhosh_stoch_stdp_pair_gf (Stochastic neuron + STDP controller (merged, GF180)) tt_um_santhosh_rsd_char_gf (RRAM Characterization Platform (DC sweep + endurance + retention + histogram, GF180)) tt_um_santhosh_xbar_ctrl_gf (Memristive Crossbar Peripheral Controller (GF180)) tt_um_joseph_bf (BF) tt_um_hydrocomms (FSK Modem) tt_um_systolic_array (2x2 MAC Systolic array with DFT) tt_um_kluterirv_rv32e_core (Minimal RV32E SoC with UART Loader) tt_um_algofoogle_ttgf26a_vco (VCO driven by DAC) tt_um_fer_logo_music_vga (UNIZG-FER VGA project) tt_um_maqsudbek_dyadic_pwm (Dyadic PWM) tt_um_waferspace_vga_screensaver (Wafer.space Logo VGA Screensaver) tt_um_htfab_vga_tester (Video mode tester)