497 2x2 Systolic array with DFT and bfloat16 - v2

497 : 2x2 Systolic array with DFT and bfloat16 - v2

Design render
  • Author: Julia Desmazes
  • Description: Second version of the 2x2 systollic array with DFT intrastructure, including bfloat16 floating point arithmetic and DFT scan chain.
  • GitHub repository
  • Open in 3D viewer
  • Clock: 100000000 Hz

Multiply and accumulate matrix multiplier ASIC with design for test infrastructure

ASIC design for a 2x2 systolic matrix multiplier supporting multiply and accumulate operations on bfloat16 data alongside a design for test infrastructure to help debug both usage and diagnose design issues in silicon.

Pinout

This accelerator uses the following pinout:

ui (Inputs) uo (Outputs) uio (Bidirectional)
ui[0] = tck uo[0] = result_o uio[0] = data_i[7]
ui[1] = data_i[0] uo[1] = result_o uio[1] = data_valid_i
ui[2] = data_i[1] uo[2] = result_o uio[2] = data_mode_i[1]
ui[3] = data_i[2] uo[3] = result_o uio[3] = data_mode_i[0]
ui[4] = data_i[3] uo[4] = result_o uio[4] = tdi
ui[5] = data_i[4] uo[5] = result_o uio[5] = tms
ui[6] = data_i[5] uo[6] = result_o uio[6] = tdo
ui[7] = data_i[6] uo[7] = result_o uio[7] = result_v_o

Chip pinout

MAC

This MAC accelerator operates at up to 100MHz and is capable of reaching up to 100 MMAC/s or 200 MFLOPS/s.

:warning: The outgoing data path on IHP sg13g2 chips has only been proven upwards of 75MHz, so although we will using the maximum theoretical frequency in this discussion it might be necessary to drive this ASIC as a lower clock frequency in practice.

Background

The goal of the MAC accelerator is to perform a matrix matrix multiplication between the input data matrix $I$ and the weight matrix $W$.

\begin{gather}
I \times W = R \\
\begin{pmatrix} 
i_{0,0} & i_{1,0} \\
 i_{0,1} & i_{1,1} 
\end{pmatrix} 

\times 

\begin{pmatrix} 
w_{0,0} & w_{1,0} \\ 
w_{0,1} & w_{1,1} 
\end{pmatrix} = 

\begin{pmatrix} 
i_{0,0}w_{0,0}+i_{1,0}w_{0,1} & i_{0,0}w_{1,0}+i_{1,0}w_{1,1}\\ 
i_{0,1}w_{0,0}+i_{1,1}w_{0,1} & i_{0,1}w_{1,0}+i_{1,1}w_{1,1}\end{pmatrix}

=

\begin{pmatrix} 
r_{0,0} & r_{1,0} \\ 
r_{0,1} & r_{1,1} 
\end{pmatrix}
\end{gather}

This MAC accelerator has 4 units and from this point on, we will refer to each MAC unit according to their unique $(x,y)$ coordinates.

Each MAC unit calculates the MAC operation $c_{(t,x,y)}$, where :

  • $w_{(x,y)}$ is the fixed weight configured for this unit; this value is fixed throughout a set of $I$ and $W$ input matrices.
  • $i_{(t,y)}$ is a value from the $y$ row of the $I$ matrix that is circulated per timestep $t$ through a row of the matrix.
  • $c_{(t-1,x,y-1)}$ is the result at the previous timestep $t-1$ of the MAC unit above this MAC unit, circulated downwards per column.
c_{(t,x,y)} = i_{(t,y)} \times w_{(x,y)} + c_{(t-1,x,y-1)}

Given this accelerator was designed to operate on 16-bit floating point numbers, there is no need for an additional clamping step.

Our final full MAC operation is as follows :

c_{(t,x,y)} = i_{(t,y)} \times w_{(x,y)} + c_{(t-1,x,y-1)}

At each MAC timestep $t+1$ :

  • the result of a MAC unit $c_{(t,x,y)}$ is shifted downwards on the same column and becomes the input of the MAC unit $(x,y+1)$ below.
  • $i_{(t,x)}$ is shifted rightwards and used as input to MAC unit $(x+1,y)$.

This data streaming allows such designs to make more efficient use of data, re-using it multiple times as the data circulates through the array, contributing to the final results without spending time on expensive data accesses, allowing us to dedicate more of our silicon area and cycles to compute.

Throughput

Assuming a pre-configured $W$ weight matrix is being reused and the accelerator is receiving a gapless stream of multiple $I$ input matrices, this MAC accelerator is capable of computing up to 100 MMAC/s or 200 MFLOPS/s.

IO Bottleneck

Accelerator operations are stalled if a MAC operation has a data dependency on data that has yet to arrive. For example, calculating $r_{(0,0)}$ depends on both $i_{(0,0)}$​ and $i_{(1,0)}$​. In practice, each operation depends on two pieces of input data, yet our input interface being only 8 bits wide allows us to transfer only a half of $i_{(x,y)}$​ per cycle.

This limitation means our accelerator is actually operating at a quarter of the maximum capacity due to this IO bottleneck. If the IO interface were either (a) at least 32 bits wide, or (b) 8 bits wide but operating at 400 MHz, resolving this bottleneck, our maximum throughput would be 400 MMAC/s or 800 MFLOPS/s.

Usage

The typical sequence to offload matrix operations to the accelerator would go as follows:

  1. Reset the accelerator (necessary on init)
  2. Configure the weights $W$ (can be re-used once configured)
  3. Send the input data $I$
  4. Read the result $R$

This design doesn't feature on-chip SRAM and has limited on-chip memory. Given weights have high spatial and temporal locality, this design allows each weight to be configured per MAC unit. This configuration can be reused across multiple matrices. The input matrix, on the other hand, is expected to be provided on each usage.

Given our input and output data buses are only 8 bits wide, for data transfers to and from the chip the matrices are flattened in the following order, with bytes transfered in little endian:

flat

Notes:

  • All references to cycles below are clocked according to the clk pin.
  • Empty cycles, as in one or more cycles where data_v_i would go low in the middle of the transfer of both the input matrix and the weights, are supported.
Resetting MAC

Given we are not sending an index alongside each data transfer to indicate which weight/data coordinates ( index ) each data corresponds to, the MAC accelerator keeps track of the next index internally. As such, if due to external reasons a partial transfer occurs, it becomes necessary to reset this index using the reset sequence described below.

The weights streaming indexes and the data streaming indexes can be reset independently, each requires a single data transfer cycle during which :

  • data_v_i is set to 1
  • data_mode_i[1:0] is set to 0x3 if we are resetting both the weight and the data indexes
  • data_i[7:0] is ignored
Example

In this example we are resetting both the data streaming index and the weight index back to back.

TODO rst_waves.png

Configure weights

Configuring the weights takes 8 data transfer cycles, during which :

  • data_v_i is set to 1
  • data_mode_i[1:0] is set to 0x0 indicating we are sending weights
  • data_i[7:0] contains the weights
Example

In this example we are configuring the weight matrix $W$ to :

W = 
\begin{pmatrix} 
0 & 1 \\ 
2 & 3 
\end{pmatrix} 

TODO wr_weights_waves.png

Debug

The implemented JTAG TAP can be used to easily debug the weight matrix configuration sequence as it allows the user using the USER_REG instruction to read the currently configured weights for each MAC unit.

In the existing openocd helper scripts located at jtag/openocd.cfg the read_user_reg can be used to read the weights using openocd when used as follows :

set r 0
for {set u 0} {$u <= $USER_REG_UNIT_MAX} {incr u} {
    puts "read internal register $u : 0x[read_user_reg $_CHIPNAME $u $r] - [print_reg_id $r]"  
}

For the $W$ weight matrix configured in the example above, the expected output should be :

read internal register 0:0 : 0x0000 - weight
read internal register 0:1 : 0x0000 - multiplicand ( input data )
read internal register 0:2 : 0x0000 - summand ( input data )
read internal register 0:3 : 0x0000 - multiplication result (internal computation)
Sending the input matrix

Sending the input matrix takes 8 data transfer cycles, during which :

  • data_v_i is set to 1
  • data_mode_i[1:0] is set to 0x1 indicating we are sending the input matrix
  • data_i[7:0] contains the input data
Example

In this example we are sending the input data matrix $I$ :

I = 
\begin{pmatrix} 
4 & 5 \\ 
6 & 7 
\end{pmatrix} 

TODO wr_data_waves.png

Receiving result

When receiving a result the asic will drive the following pins during 8 data transfer cycles. The transfer is guarantied to be gapeless :

  • res_v_o is set to 1
  • res_o[7:0] contains the result of the MAC operation for a single matrix coordinate

In order to start capture by the pio hardware on the raspberry pi silicon, res_v_o is asserted a cycle before the data transfer starts. The two result streams occure back-to-back this will not occur.

Simple example

In this example the $W$ MAC weight matrix is being configured and the $I$ data is being streamed in, following which, the $R$ result starts being sent out.

R = I \times W = 
\begin{pmatrix} 
4 & 5 \\ 
6 & 7 
\end{pmatrix}
\times
\begin{pmatrix} 
0 & 1 \\ 
2 & 3 
\end{pmatrix}
=
\begin{pmatrix} 
10 & 19 \\ 
14 & 27
\end{pmatrix}

TODO rd_res_waves.png

Complex example

Internally, the accelerator takes at most 8 cycles to produce a result from incoming data. This accounts for incoming data latching, circulating the data through the entire systolic array, and output streaming. The accelerator moves the incoming data through the array as soon as it is available. Because of this, and since this accelerator supports gaps in the incoming data stream, if, for example, the last data transfer of $i_{(1,1)}$ ​is delayed by at least 2 cycles, then the accelerator result will start streaming out before all of the input matrix has finished streaming in.

This is why, in the firmware (firmware/main.c), we set up the DMA stream to receive the data before we start sending the input matrix, as the gap between sending and getting the result is too small for the controlling MCU to perform any type of compute.

TODO rd_res_complex_waves.png

DFT

This design embeds a JTAG for debugging the accelerator's usage by probing into internal registers and helping identify PCB issues using a boundary scan.

This JTAG TAP was designed to operate at 2 MHz, has idcode 0x2beef0d7.

Its instruction register length is 3, and implements the following instructions:

Instruction Opcode Description
EXTEST 0x0 Boundary scan
IDCODE 0x1 Reads JTAG TAP identifier
SAMPLE_PRELOAD 0x2 Boundary scan
USER_REG 0x3 Probe internal registers
SCAN_CHAIN 0x4 Internal logic scan chain
BYPASS 0x7 Set the TAP in bypass mode

All four standard instructions EXTEST, IDCODE, SAMPLE_PRELOAD, BYPASS conform to the standard behavior.

SCAN_CHAIN is a private JTAG instruction used for observing the systolic array's flops state. The order of the flop chain can be found at the end of the .def file in the definition of the chain_0 scan chain.

USER_REG

The USER_REG state was designed to probe into the data currently used by each of the 4 MAC units. The data to be read is specified by loading its address in the data register during a previous DR_SHIFT stage. As such, two sequences of DR_SHIFTS might be necessary:

  1. Load the address of the next data
  2. Read the data off TDI

The address and data are both 16 bits wide, though only the bottom 4 bits of the address are used.

Address format

The address uses the following format:

[ unused 15:4 ][ mac unit 3:2 ][ register id 1:0 ] 

Register id mapping for this MAC unit gives us the current:

Register ID Description
0x0 Weight (multiplier)
0x1 Multiplicand (circulated data)
0x2 Summand (circulated data)
0x3 Multiplication result (internal MAC unit data)

Important considerations for usage

When using the USER_REG custom JTAG TAP instruction, the MAC logic is expected to be temporarily halted, as in no weight or data update operations and no matrix compute is expected to be ongoing. To this effect, there is no CDC protection when transferring data between the JTAG clock domain and the MAC domain. If the MAC isn't halted, the resulting metastability risks corrupting the sampled data.

This also applies when doing a boundary scan.

Quickstart

For quickly getting started, use the utilities provided in jtag/openocd.cfg.

Given this default config assumes you are using a jlink, and this might not be the adapter you are using, you may need to update the adapter sourcing your current probe:

source [find interface/jlink.cfg]
Usage

Run using :

openocd -f jtag/openocd.cfg

Expected output:

Open On-Chip Debugger 0.12.0+dev-02429-ge4c49d860 (2026-03-17-19:44)
Licensed under GNU GPL v2
For bug reports, read
	http://openocd.org/doc/doxygen/bugs.html
Info : J-Link V10 compiled Jan 30 2023 11:28:07
Info : Hardware version: 10.10
Info : VTarget = 3.348 V
Info : clock speed 2000 kHz
Info : JTAG tap: tpu.tap tap/device found: 0x2beef0d7 (mfg: 0x06b (Transwitch), part: 0xbeef, ver: 0x2)
Warn : gdb services need one or more targets defined
idcode : 2beef0d7
read internal register 0:0 : 0x0000 - weight
read internal register 0:1 : 0x0000 - multiplicand ( input data )
read internal register 0:2 : 0x0000 - summand ( input data )
read internal register 0:3 : 0x0000 - multiplication result (internal computation)
read internal register 1:0 : 0x0000 - weight
read internal register 1:1 : 0x0000 - multiplicand ( input data )
read internal register 1:2 : 0x0000 - summand ( input data )
read internal register 1:3 : 0x0000 - multiplication result (internal computation)
read internal register 2:0 : 0x0000 - weight
read internal register 2:1 : 0x0000 - multiplicand ( input data )
read internal register 2:2 : 0x0000 - summand ( input data )
read internal register 2:3 : 0x0000 - multiplication result (internal computation)
read internal register 3:0 : 0x0000 - weight
read internal register 3:1 : 0x0000 - multiplicand ( input data )
read internal register 3:2 : 0x0000 - summand ( input data )
read internal register 3:3 : 0x0000 - multiplication result (internal computation)
Info : Listening on port 6666 for tcl connections
Info : Listening on port 4444 for telnet connections
...

IO

#InputOutputBidirectional
0tckresult_odata_i[7]
1data_i[0]result_odata_valid_i
2data_i[1]result_odata_mode_i[1]
3data_i[2]result_odata_mode_i[0]
4data_i[3]result_otdi
5data_i[4]result_otms
6data_i[5]result_otdo
7data_i[6]result_oresult_v_o

Chip location

Controller Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Analog Mux Analog Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux tt_um_chip_rom (Chip ROM) tt_um_factory_test (Tiny Tapeout Factory Test) tt_um_oscillating_bones (Oscillating Bones) tt_um_mlyoung_wedgetail (Wedgetail TCDE REV01) tt_um_adex_neuron_ncs (AdEx Neuron NCS) tt_um_opamp_gfcwfzkm (Operational Amplifier Test Circuits) tt_um_analog_genesis (Genesis) tt_um_techhu_analog_trial (SoilZ v1 Lock-In Impedance Analyzer) tt_um_rejunity_vga_logo (VGA Tiny Logo Roto Zoomer) tt_um_rebeccargb_universal_decoder (Universal Binary to Segment Decoder) tt_um_rebeccargb_hardware_utf8 (Hardware UTF Encoder/Decoder) tt_um_rebeccargb_intercal_alu (INTERCAL ALU) tt_um_rebeccargb_vga_pride (VGA Pride) tt_um_urish_simon (Simon Says memory game) tt_um_silicon_art (Silicon Art - Pixel Pig + Canary Token) tt_um_aksp_mbist_mbisr (MBIST + MBISR Built-In Memory Test & Repair) tt_um_wokwi_450492230413445121 (RandomNum) tt_um_wokwi_450492214548484097 (test) tt_um_wokwi_450491696806684673 (Snake) tt_um_wokwi_450491302960427009 (TinyTapeout test) tt_um_wokwi_450492208120711169 (Tadder) tt_um_wokwi_450492222691728385 (test_prj) tt_um_not_a_dinosaur (Not a Dinosaur) tt_um_vga_example (Silly Dog) tt_um_mo_module (vga test project) tt_um_snake_game (SnakeGame) tt_um_cic_filter_demo (Discrete-to-ASIC Delta-Sigma Acquisition System) tt_um_spi_aggregator (Quad SPI Aggregator) tt_um_herald (Herald) tt_um_wokwi_453664332125344769 (Digital Lock with Easter Eggs) tt_um_floAfentaki_top (tinyTapeVerilog_out) tt_um_chrismoos_6502_mcu (m6502 Microcontroller) tt_um_piggy_top (Piggybag) tt_um_wokwi_454491386271657985 (6 Bit Roulette) tt_um_ro_puf_trng (RO-based security primitives) tt_um_wokwi_453674671092670465 (My first Wokwi design) tt_um_ezelioli_blockvga (VGABlock) tt_um_SotaSoC (SotaSoC) tt_um_wokwi_455292199922428929 (Second TT experiment) tt_um_wokwi_455301361268988929 (7-Segment-Wokwi-Design) tt_um_hexcnt_elfuchso ((Hexa)Decimal Counter) tt_um_tobiasgreiser_move_vga_square (move VGA square) tt_um_wokwi_455291603750154241 (Temporary Title) tt_um_wokwi_455297270449581057 (RGB PWM) tt_um_wokwi_455291640579325953 (Primitive clock divider) tt_um_wokwi_455291654653337601 (Tiny Ape Out) tt_um_wokwi_455291618175430657 (adder) tt_um_wokwi_455291689699908609 (test) tt_um_wokwi_455291713774178305 (Just logic) tt_um_wokwi_455291560479595521 (title) tt_um_neb_top (neb tt26a first asic) tt_um_wokwi_455291650915156993 (tiny-tapeout-workshop-result) tt_um_joh1x_prng (8-bit PRNG) tt_um_wokwi_455293127747668993 (Tiny Tapeout) tt_um_wokwi_455291727376311297 (custom_lol) tt_um_wokwi_455291594023558145 (Tiny Tapeout) tt_um_wokwi_455291650157032449 (Tiny Tapeout N) tt_um_tomolt_rasterizer (Tiny Triangle Rasterizer) tt_um_wokwi_455291872587345921 (Test) tt_um_wokwi_455299783033986049 (Example) tt_um_wokwi_455299761916711937 (Try1) tt_um_wokwi_455291701422995457 (Clock Divider Test Project) tt_um_wokwi_455291654560013313 (Test) tt_um_hopfield (Hopfield Associative Memory β€” Odd Digit Recall on 7-Segment) tt_um_wokwi_455291651646015489 (Workshop Day) tt_um_wokwi_455291845594899457 (83rk: Tiny Tapeout) tt_um_wokwi_455301826476070913 (FirstTapeOut2) tt_um_dranoel06_SAP1 (Programmable 8-BIT CPU) tt_um_wokwi_455300379278483457 (test) tt_um_wokwi_455291837898350593 (Custom_ASIC) tt_um_wokwi_455300931094822913 (Tiny Tapeout Accumulator) tt_um_nampukk_top (Miniproc) tt_um_krimmel_mini_synth (Mini Synth) tt_um_wokwi_455291688143820801 (My first tapeout) tt_um_wokwi_455299760551464961 (Freddys tapeout) tt_um_fir_filter (UART-Programmable 2-Tap FIR Filter) tt_um_arthfink_ddmtd (DDMTD) tt_um_wokwi_455291792579934209 (Test) tt_um_wokwi_455291642603080705 (Test) tt_um_wokwi_455303220350374913 (Simon Says) tt_um_wokwi_455292153909854209 (simple XOR cipher) tt_um_wokwi_455300425088680961 (Hello tinyTapout) tt_um_wokwi_455291692145189889 (^My first design) tt_um_wokwi_455290751669808129 (Tiny Tapeout - Riddle Implementation) tt_um_wokwi_455291724163472385 (Tobias first Wokwi design) tt_um_wokwi_455291807082792961 (My Tiny Tapeout) tt_um_wokwi_455303592417686529 (Count Upwards) tt_um_wokwi_455291645628225537 (Nielss first failure) tt_um_wokwi_455291698669433857 (Tiny_Tapeout_Test) tt_um_wokwi_455293379343017985 (Switch Puzzle) tt_um_yniklas_ma (Multiply-Add) tt_um_yjulian_alu (uCore) tt_um_wokwi_455291641368904705 (WIP Title) tt_um_wokwi_455303526551376897 (just copy 4 not gates) tt_um_wokwi_455303914893650945 (WIP) tt_um_fpga_can_lehmann (FPGA) tt_um_wokwi_455291728585320449 (gatekeeping the gates) tt_um_wokwi_455291738750219265 (RTX 8090) tt_um_ct4111_buzzer (Two Song Buzzer Player) tt_um_wokwi_455291631430488065 (Tudor BCD Test) tt_um_wokwi_455300137923517441 (Tiny Tapeout 1) tt_um_wokwi_455300818767153153 (little frequency divider) tt_um_wokwi_455291748212573185 (Test - clock divider) tt_um_wokwi_455291611094391809 (Counter) tt_um_wokwi_455331155298538497 (WIP) tt_um_mattvenn_vgatest (VGA demo) tt_um_matth_fischer_vgaTT (VGA Squares) tt_um_phsauter_vga_maze (VGA Maze Runner) tt_um_themightyduckofdoom_bitserial_collatz_checker (Bit-Serial Collatz Conjecture Checker) tt_um_tt_tinyQV (Borg - Tiny GPU) tt_um_pgfarley_tophat_top (tophat) tt_um_jamesbuchanan_silly (Silly demo) tt_um_sid (SID Voice Synthesizer) tt_um_tinymoa_ihp26a (TinyMOA: RISC-V CPU with Compute-in-Memory Accelerator) tt_um_embeddedinn_vga (Cyber EMBEDDEDINN) tt_um_urish_rings (VGA Rings) tt_um_swenson_cqs (quad-sieve) tt_um_brmurrell3_m31_accel (M31 Mersenne-31 Arithmetic Accelerator) tt_um_corey (Bernstein-Yang Modular Inverse (secp256k1)) tt_um_wokwi_456019228852926465 (Johnson counter) tt_um_posit_mac_stream (8Bit Posit MAC Unit) tt_um_wokwi_453110263532536833 (Tiny tapeout MAC unit) tt_um_wokwi_456131795093444609 (Test) tt_um_calonso88_spi_i2c_reg_bank (Register bank accessible through SPI and I2C) tt_um_tinyperceptron_karlmose (Tiny Perceptron) tt_um_wokwi_456571536458697729 (Full Adder) tt_um_wokwi_456571856158098433 (MJ Wokwi project) tt_um_pong (TinyPong) tt_um_wokwi_456574262189506561 (Simple counter) tt_um_wokwi_456571610504973313 (4ish bit adder) tt_um_wokwi_456571875721383937 (Hello World) tt_um_tsetlin_machine (Tsetlin Machine for low-power AI) tt_um_wokwi_456574528376856577 (Cremedelcreme) tt_um_wokwi_456571730084640769 (tiny tapeout half adder) tt_um_romultra_top (SPI RAM Driver) tt_um_wokwi_456575247946496001 (4-bit full adder) tt_um_wokwi_456576478636229633 (Alex first circuit) tt_um_wokwi_456571639638628353 (Malthes First Template) tt_um_wokwi_456573048893551617 (And_Or) tt_um_wokwi_456571585305580545 (WIP Bin to Dec) tt_um_wokwi_456571687437989889 (TinyTapeNkTest) tt_um_wokwi_456573098517424129 (2 Digit Display) tt_um_wokwi_456571798679348225 (nand_gate) tt_um_luke_meta (TTIHP26a_Luke_Meta) tt_um_wokwi_456576238411624449 (Tiny Tapeout Amaury Basic test) tt_um_wokwi_456571702278493185 (JayF-HA) tt_um_wokwi_456571626036491265 (Tiny Tapeout First Design) tt_um_wokwi_456571568465442817 (First tinytapeout 234) tt_um_wokwi_456577873995405313 (Switch deBounce for Rotary Encoder) tt_um_wokwi_456571625108498433 (Scott's first Wokwi design) tt_um_wokwi_456578179564131329 (idk) tt_um_wokwi_456573141570913281 (Tiny Tapeout chip) tt_um_wokwi_456578608558661633 (Hidden combination) tt_um_carlhyldborglundstroem_code (8_cool_modes) tt_um_wokwi_456572140961867777 (TestWorkShop) tt_um_wokwi_456575744901356545 (Design_test_workshop) tt_um_wokwi_456571628648495105 (Tiny Tapeout Test Gates) tt_um_wokwi_456578790697395201 (vis_3) tt_um_ISC77x8_HansAdam2077 (ISC77x16) tt_um_wokwi_456571702213480449 (Tiny Tapeout Test Gates) tt_um_wokwi_456575028603245569 (Test) tt_um_wokwi_456571746867102721 (fullAdder) tt_um_wokwi_456576548374933505 (My first design) tt_um_wokwi_456571983499280385 (Tiny Tapeout Test) tt_um_wokwi_456579003210233857 (Tiny Tapeout placeholder) tt_um_wokwi_456572032892464129 (Workshop) tt_um_hoene_smart_led_digital (Smart LED digital) tt_um_wokwi_456572126761001985 (test project) tt_um_schoeberl_undecided (Undecided) tt_um_flummer_ltc (Linear Timecode (LTC) generator with I2C control) tt_um_jalcim (tiny_tester) tt_um_ECM24_serv_soc_top (FH Joanneum TinyTapeout) tt_um_fabulous_ihp_26a (Tiny FABulous FPGA) tt_um_LnL_SoC (Lab and Lectures SoC) tt_um_ygdes_hdsiso8_dlhq (ttihp-HDSISO8) tt_um_algofoogle_fomo (FOMO) tt_um_chrbirks_top (ADPLL) tt_um_MichaelBell_photo_frame (Photo Frame) tt_um_miniMAC (miniMAC) tt_um_crockpotveggies_neuron (Neuromorphic Tile) tt_um_coastalwhite_canright_sbox (Canright SBOX) tt_um_ebeam_pixel_core (E-Beam Inspection Pixel Core) tt_um_DelosReyesJordan_SEM (8-bit SEM Floating-Point Multiplier) tt_um_chisel_template (One One) tt_um_vga_tetris (VGA Tetris) tt_um_wokwi_457062377137305601 (Count To Ten) tt_um_techhu_rv32_trial (LoRa Edge SoC) tt_um_risc_v_wg_swc1 (ttihp-26a-risc-v-wg-swc1) tt_um_andreasp00 (TT6581) tt_um_pakesson_glitcher (Glitcher) tt_um_intv0id_kalman (Kalman Filter for IMU) tt_um_neuromurf_seq_mac_inf (SEQ_MAC_INF_16H3 - Neural Network Inference Accelerator) tt_um_obrhubr (8-bit Prime Number Detector) tt_um_anujic_rng (True(er) Random Number Generator (TRNG)) tt_um_async_test (Chisel Async Test) tt_um_wokwi_458140717611045889 (O2ELHd 7segment display) tt_um_libormiller_SIMON_SPI (SIMON) tt_um_tschai_yim_mill (Tschai's Tic-Tac-Toe) tt_um_microlane_demo (microlane demo project) tt_um_vga_leonllrmc (LLR simple VGA GPU) tt_um_mchiriac (TinyTapeout-Processor2) tt_um_RongGi_tiny_dino (tiny_dino) tt_um_kianv_sv32_soc (KianV SV32 TT Linux SoC) tt_um_chatelao_fp8_multiplier (OCP MXFP8 Streaming MAC Unit) tt_um_filterednoise_infinity_core (Infinity Core) tt_um_alessio8132 (4-bit processor) tt_um_pwm_controller_atudoroi (UART interfaced 8ch PWM controller) tt_um_uart_alu (UART-ALU Processor) tt_um_ALU_t_rick (A fully functional ALU (Arithmetic logic unit)) tt_um_TscherterJunior_top (smolCPU) tt_um_lkhanh_vga_trng (VGA multiplex with TRNG) tt_um_wokwi_456572315745884161 (Tiny tape out test) tt_um_tmr_voter (Triple Modular Redundancy) tt_um_adriantrummer_checker (TinyTapeout VGA Checker) tt_um_jamesbuchanan_silly_mixer (Silly Mixer) tt_um_wokwi_456576419565744129 (tinytapeout_henningp_2bin_to_4bit_decoder) tt_um_ztimer_top (Tiny Tapeout Factory Test for ttihp-timer) tt_um_michaelstambach_vogal (VoGAl) tt_um_teenyspu (TeenySPU) tt_um_wokwi_458752568884674561 (7 segmant ihp resistcode) tt_um_Xelef2000 (RNG) tt_um_gschultz_bouncingcheckers (Bouncing Checkers) tt_um_ygdes_hdsiso8_rs (ttihp-HDSISO8RS) tt_um_malik_tiny_npu (Tiny NPU: 4-Way Parallel INT8 Inference Engine) tt_um_malik_mac_ripple (Gate-Level 8-bit MAC with Ripple-Carry Accumulator) tt_um_faaaa (Demoscreen full of RICH) tt_um_sat_add_blanluc (8 bit saturated adder) tt_um_wokwi_455303279136701441 (Spell. My. Name.) tt_um_thomasherzog_plasma (Plasma) tt_um_YannGuidon_TinyScanChain (TinyScanChain) tt_um_moss_display (moss_display) tt_um_delta (Delta Wing Flight Control Mixer with PWM Output) tt_um_maluei_badstripes (badstripes) tt_um_float_synth_nikleberg (float_synth) tt_um_catalinlazar_ihp_osc_array (IHP Gate Delay Characterizer (3-Flavor)) tt_um_essen (2x2 Systolic array with DFT and bfloat16 - v2) tt_um_ecc_gf2_8 (Tiny_ECC) tt_um_wokwi_459117403524075521 (4-bit ALU) tt_um_gfcwfzkm_scope_bfh_mht1_3 (Basic Oszilloscope and Signal Generator) tt_um_recursivetree_tmmu_top (Tiny MMU) tt_um_prime (8-bit Prime Number Detector) tt_um_gian_alu (tt_gian_alu) tt_um_multitool_soc_mauro_ciccone (Multi-Tool SoC) tt_um_FTEVE_FISH (Flying Fish) tt_um_wscore (8-bit RISC-V Lite CPU) tt_um_wokwi_459234034322375681 (TinyTapeout Signal Box) tt_um_anna_vee (2 digit minute timer) tt_um_zettpe_mini_psg (Mini PSG) tt_um_pmiotti_squares_hypnosis (hypnotic squares) tt_um_mzollin_glitch_detector (Glitch Detector) tt_um_spongent88 (Spongent-88 Hash Accelerator) tt_um_8bit_mac (8bit-mac-unit) tt_um_wokwi_459285910800527361 (4-Bit Counter and Registers Demo) tt_um_tobisma_random_snake (Random Snake) tt_um_maze_game (Maze Explorer Game) tt_um_ihp26a_ring_osc (Verilog ring oscillator) tt_um_vga_ca (vga_ca) tt_um_wokwi_459299619699169281 (TinySRAM) tt_um_jmkr_ece_git_code_lock (Code Lock) tt_um_wokwi_459303685175910401 (Bday Candle Chip) tt_um_wokwi_455293203542942721 (1-4 Counter) tt_um_wokwi_454935456504261633 (2 Bit Adder) tt_um_wokwi_455291660779120641 (74LS138) tt_um_wokwi_455291682546516993 (Mein Hund Gniesbert) tt_um_wokwi_455291649462874113 (Tiny Tapeout Full Adder) tt_um_wokwi_455293410637770753 (Yturkeri_Mytinytapeout) tt_um_wokwi_455291642978471937 (2-Bit Adder) tt_um_wokwi_456118923713667073 (4-Bit Adder) tt_um_wokwi_456571724337390593 (7 Segment Binary Viewer) tt_um_wokwi_456578784059908097 (7 segment number viewer) tt_um_wokwi_456571605697249281 (Hello) tt_um_wokwi_456576571487651841 (7 Segment BCD) tt_um_wokwi_456571686260436993 (Tiny Tapeout Workshop Test) tt_um_wokwi_455291787137823745 (TinyTapeout logic gate test) tt_um_wokwi_455291649222749185 (lriglooCs-first-Wokwi-design) tt_um_wokwi_456578694921494529 (sree) tt_um_wokwi_456571638794523649 (GDS Test) tt_um_2048_vga_game (2048 sliding tile puzzle game (VGA)) tt_um_urish_usb_cdc (USB CDC (Serial) Device) tt_um_tippfehlr_nyan_cat (NYAN CAT) tt_um_wokwi_459210187582694401 (Simple Counter) tt_um_zouzias (Yet another VGA tinytapeout) tt_um_Jan_three_body_solution (Three Body Solution) Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available