678 Tiny Neural Network Accelerator

678 : Tiny Neural Network Accelerator

Design render

How it works

This project is a neural network accelerator designed for use with convolutional neural networks. The verilog is generated from system verilog source which lives in a separate repository: https://github.com/GregAC/tiny-nn which also contains the full DV environment, model, documentation and related utilities and software.

Internally it contains a number of 16-bit floating point add and multiply units (using something approximating the BF16 floating point encoding) that can be configured to work in different ways for different operations. Operations available are

  • Convolve - Computes a 4x2 convolution kernel across an image. The parameters are loaded in and held in flops then the image is streamed in one pixel at a time. The most recent 4x2 image pixels are also held in flops so every 2 new pixels giving a new image column computes a new convolution (internally the last image column is dropped and the new column shifted in).
  • Accumulate - Sum groups of N input numbers with a fixed bias added and an optional RELU operation (0 if accumulation less than 0 otherwise leaves accumulation untouched). N is provided by the operation word and the bias is only loaded in once. E.g. if you set N = 4 an bias of 1.0 for each 4 numbers input it would sum them together then add the bias and do the optional RELU. Numbers keep streaming in until the operation is terminated

The interface is a fixed 16 bits in and 8 bits out synchronous to the clock. Each operation has a special operand code that starts it that needs to be sent on the 16 input bits. Once started the 16 input bits provide the numbers used for the operation.

The 16-bit numbers output are split over 2 clock cycles for the 8 bit output. With the lower byte output first. The user needs to know when the output is relevant (some cycles the output should be ignored and some it should be captured).

  • ui_in, uio_in - 16-bit input, ui_in is top byte
  • uo_out - 8-bit output

https://github.com/GregAC/tiny-nn should contain full documentation with the details and software to use the accelerator (both a work in progress at tapeout time!).

How to test

There are 3 test modes to test basic input output.

ASCII test

Place 16'hFFFF on the input {ui_in, uio_in} and hold it and on the output you will observe a repeating pattern:

  • 8'h54
  • 8'h2d
  • 8'h4e
  • 8'h4e

This is 'T-NN' in ASCII

Pulse Test

Place 16'hF000 on the input {ui_in, uio_in} and hold it and on the output you will observe a repeating pattern:

  • 8'haa
  • 8'h55

Count Test

Place 16'hF1XX on the input {ui_in, uio_in} where XX is any 8-bit number and on the output you will observe a count down from that number.

Accumulate Operation

The simplest operation is the accumulate one. We'll configure it to add two numbers at a time with a -3.5 bias and RELU. Then we'll add 1.0 + 2.0 and 3.0 + 4.0. Put the following on the input over successive clocks

  • 16'h2101 # Command word for accumulate operation
  • 16'hc060 # -3.5 bias
  • 16'h3f80 # 1.0
  • 16'h4000 # 2.0
  • 16'h4040 # 3.0
  • 16'h4080 # 4.0
  • 16'hffff # NaN - terminates operation

On the output you should observe:

  • 16'hX
  • 16'hX
  • 16'hX
  • 16'hX
  • 16'hX
  • 16'hX
  • 16'hX
  • 16'h00
  • 16'h00
  • 16'h60
  • 16'h40

The 16'hX outputs could be anything and should be ignored, the first number output is 0000 representing 0.0 RELU(1.0 + 2.0 - 3.5) = 0.0, the second number output is 4060 representing 3.5 RELU(3.0 + 4.0 - 3.5) = 3.5.

External hardware

No specific external hardware required but it does need some external part to drive the desired sequences, this can be handled by the RP2040 on the demo board.

IO

#InputOutputBidirectional
0ui[0]uo[0]uio[0]
1ui[1]uo[1]uio[1]
2ui[2]uo[2]uio[2]
3ui[3]uo[3]uio[3]
4ui[4]uo[4]uio[4]
5ui[5]uo[5]uio[5]
6ui[6]uo[6]uio[6]
7ui[7]uo[7]uio[7]

Chip location

Controller Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux tt_um_chip_rom (Chip ROM) tt_um_factory_test (Tiny Tapeout Factory Test) tt_um_tommythorn_maxbw (Asynchronous Multiplier) tt_um_mattvenn_rgb_mixer (RGB Mixer demo5) tt_um_find_the_damn_issue (Find The Damn Issue) tt_um_brandonramos_VGA_Pong_with_NES_Controllers (VGA Pong with NES Controllers) tt_um_kb2ghz_xalu (4-bit minicomputer ALU) tt_um_a1k0n_demo (Demo by a1k0n) tt_um_zec_square1 ("SQUARE-1": VGA/audio demo) tt_um_jmack2201 (Sprite Bouncer with Looping Background Options) tt_um_ran_DanielZhu (Dice) tt_um_gfg_development_tinymandelbrot (TinyMandelbrot) tt_um_toivoh_demo_tt08 (Sequential Shadows [TT08 demo competition]) tt_um_quarren42_demoscene_top (asic design is my passion) tt_um_crispy_vga (Crispy VGA) tt_um_MichaelBell_canon (TT08 Pachelbel's Canon demo) tt_um_shuangyu_top (Calculator) tt_um_wokwi_407306064811090945 (DDR throughput and flop aperature test) tt_um_favoritohjs_scroller (VGA Scroller) tt_um_tt08_wirecube (Wirecube) tt_um_vga_glyph_mode (Glyph Mode) tt_um_a1k0n_vgadonut (VGA donut) tt_um_roy1707018 (RO) tt_um_sign_addsub (CMOS design of 4-bit Signed Adder Subtractor) tt_um_patater_demokit (Patater Demo Kit Waggling Rainbow on a Chip) tt_um_simon_cipher (simon_cipher) tt_um_thexeno_rgbw_controller (RGBW Color Processor) tt_um_demosiine_sda (DemoSiine) tt_um_bytex64_munch (Munch) tt_um_cfib_demo (cfib Demoscene Entry) tt_um_Richard28277 (4-bit ALU) tt_um_betz_morse_keyer (Morse Code Keyer) tt_um_nvious_graphics (nVious Graphics) tt_um_ezchips_calc (8-Bit Calculator) tt_um_hack_cpu (HACK CPU) tt_um_ring_divider (Divided Ring Oscillator) tt_um_ephrenm_tsal (TSAL_TT) tt_um_kapilan_alarm (Alarm Clock) tt_um_stochastic_addmultiply_CL123abc (Stochastic Multiplier, Adder and Self-Multiplier) tt_um_dlfloatmac (DL float MAC) tt_um_faramire_rotary_ring_wrapper (Rotary Encoder WS2812B Control) tt_um_i2c_peripheral_stevej (i2c peripherals: leading zero count and fnv-1a hash) tt_um_yuri_panchul_schoolriscv_cpu_with_fibonacci_program (schoolRISCV CPU with Fibonacci program) tt_um_yuri_panchul_adder_with_flow_control (Adder with Flow Control) tt_um_brailliance (Brailliance) tt_um_nyan (nyan) tt_um_MichaelBell_mandelbrot (VGA Mandelbrot) tt_um_fountaincoder_top_ad (pulse_add) tt_um_edwintorok (Rounding error) tt_um_mac (MAC) tt_um_dpmu (DPMU) tt_um_JAC_EE_segdecode (7 Segment Decode) tt_um_yuri_panchul_sea_battle_vga_game (Sea Battle) tt_um_benpayne_ps2_decoder (PS2 Decoder) tt_um_meriac_play_tune (Super Mario Tune on A Piezo Speaker) tt_um_comm_ic_bhavuk (Comm_IC) tt_um_daosvik_aesinvsbox (AES Inverse S-box) tt_um_cattuto_sr_latch (TT08 - experiments with latch-based shift registers) tt_um_silice (Warp) tt_um_jayjaywong12 (mulmul) tt_um_emmyxu_obstacle_detection (Obstacle Detection) tt_um_neural_navigators (Neural Net ASIC) tt_um_resfuzzy (resfuzzy) tt_um_cejmu (CEJMU Beers and Adders) tt_um_16_mic_beamformer_arghunter (16 Mic Beamformer) tt_um_pdm_pitch_filter_arghunter (PDM Pitch Filter) tt_um_pdm_correlator_arghunter (PDM Correlator) tt_um_ddc_arghunter (DDC) tt_um_i2s_to_pwm_arghunter (I2S to PWM ) tt_um_supermic_arghunter (Supermic ) tt_um_dmtd_arghunter (DMTD ) tt_um_htfab_bouncy_capsule (Bouncy Capsule) tt_um_samuelm_pwm_generator (PWM generator) tt_um_toivoh_demo_deluxe (Sequential Shadows Deluxe [TT08 demo competition]) tt_um_faramire_stopwatch (Simple Stopwatch) tt_um_johshoff_metaballs (Metaballs) tt_um_top (Flame demo) tt_um_NicklausThompson_SkyKing (SkyKing Demo) tt_um_Electom_cla_4bits (4-bit CLA) tt_um_vga_cbtest (Generate VGA output for Color Blindness Test) tt_um_zoom_zoom (Zoom Zoom) tt_um_dpmunit (DPM_Unit) tt_um_clock_divider_arghunter (Clock Divider ) tt_um_dlmiles_poc_fskmodem_hdlctrx (FSK Modem +HDLC +UART (PoC)) tt_um_emilian_muxpga (TinyFPGA resubmit for TT08) tt_um_pyamnihc_dummy_counter (Dummy Counter) tt_um_whynot (Why not?) tt_um_dlmiles_tt08_poc_uart (UART) tt_um_dendraws_donut (donut) tt_um_tmkong_rgb_mixer (RGB Mixer) tt_um_led_matrix_ayla_lin (32x8 LED Matrix Animation) tt_um_rebeccargb_tt09ball_screensaver (TT09Ball VGA Screensaver) tt_um_rebeccargb_vga_pride (VGA Pride) tt_um_levenshtein (Fuzzy Search Engine) tt_um_rebeccargb_colorbars (Color Bars) tt_um_jamesrosssharp_1bitam (1bit_am_sdr) tt_um_rebeccargb_hardware_utf8 (Hardware UTF Encoder/Decoder) tt_um_rebeccargb_styler (Styler) tt_um_rebeccargb_vga_timing_experiments (VGA Timing Experiments) tt_um_rebeccargb_universal_decoder (Universal Binary to Segment Decoder) tt_um_rebeccargb_intercal_alu (INTERCAL ALU) tt_um_toivoh_pio_ram_emu_example (pio-ram-emulator example: Julia fractal) tt_um_tobimckellar_top (Simple PWM Module) tt_um_JesusMinguillon_freqSweep (freqSweep) tt_um_led_cipher (LED Bitserial Cipher) tt_um_my_elevator (Elevator Design) tt_um_wokwi_413387065339458561 (APA102 to WS2812 Translator) tt_um_wokwi_413386991502909441 (SPI Logic Analyzer with Charlieplexed Display) tt_um_alf19185_ALU (4 bit ALU ) tt_um_rtfb_collatz (Collatz conjecture brute-forcer) tt_um_senolgulgonul (Senol Gulgonul tt09) tt_um_Esteban_Oman_Mendoza_maze_2024_top (Space Detective Maze Explorer) tt_um_sebastienparadis_hamming_top (Hamming Code (7,4)) tt_um_prefix8 (tiny-tapeout-8bit-GPTPrefixCircuit) tt_um_lif_tk (LIF on a Ring Topology) tt_um_asheldon44_dsm_decimation_filter (Delta-Sigma ADC Decimation Filter) tt_um_juarez_jimenez (an lfsr with synaptic neurons (excitatory or inhibitatory)) tt_um_lif_clarencechan28 (Perceptron) tt_um_uart_mvm (Matmul System) tt_um_algofoogle_tt09_ring_osc (Verilog ring oscillator) tt_um_pid_controller (PID Controller) tt_um_frequency_counter (Frequency Counter SSD1306 OLED) tt_um_delta_liafn (Delta RNN and Leaky Integrate-and-Fire Nueron Circuit) tt_um_devinatkin_basys3_uart (Basys 3 Over UART Link) tt_um_pwm_top (Generador PWM multiproposito con frecuencia y ciclo de trabajo modulable) tt_um_lfsr_stevej (Linear Feedback Shift Register) tt_um_jamesrosssharp_tiny1bitam (Tiny 1-bit AM Radio) tt_um_instrumented_ring_oscillator (instrumented_ring_oscillator) tt_um_lif1 (STDP Circuit) tt_um_alif (3 Neuron ALIF) tt_um_tiny_ternary_tapeout (T3 (Tiny Ternary Tapeout)) tt_um_snn_with_delays_paolaunisa (ChatGPT-generated Spiking Neural Network with Delays) tt_um_arandomdev_fir_engine_top (FIREngine) tt_um_carryskip_adder8 (8-bit carry-skip) tt_um_riscv_mini (RISC-V Mini) tt_um_CLA8 (8-bit Carry Look-Ahead Adder) tt_um_hybrid_adder (Hybrid_Adder_8bit) tt_um_uart_mvm_sys (Matmul System) tt_um_MichaelBell_hd_8b10b (8b10b decoder and multiplier) tt_um_program_counter_top_level (Test Design 1) tt_um_murmann_group (Decimation Filter for Incremental and Regular Delta-Sigma Modulators) tt_um_adder_accumulator_sathworld (adder-accumulator) tt_um_control_block (ECE 298A 8-Bit CPU Control Block) tt_um_LFSR_Encrypt (LFSR Encrypter) tt_um_cdc_test (SkyKing Demo) tt_um_two_lif_stdp (Two LIF Neurons with STDP Learning) tt_um_underserved (ITS-RISCV) tt_um_znah_vga_ca (znah_vga_ca) tt_um_mikegoelzer_7segmentbyte (7-Segment Byte Display) tt_um_idann (Forward Pass Network for Simple ANN) tt_um_carryskip_adder9 (carry skip adder) tt_um_mroblesh (Frequency Encoder and Decoder) tt_um_wokwi_411379488132926465 (Semana UCU Verilog) tt_um_rejunity_atari2600 (Atari 2600) tt_um_rejunity_z80 (Zilog Z80) tt_um_couchand_cora16 (CORA-16) tt_um_kashmaster_carryskip (8-bit-CARRY_SKIP) tt_um_tiny_ternary_tapeout_csa (T3 (Tiny Ternary Tapeout) CSA ) tt_um_array_secD7 (Tiny Tapeout Group 7 Lab D) tt_um_chip4lyfe (Leaky Integrate Fire Neuron) tt_um_ronikant_jeremykam_tinyregisters (Tiny Registers) tt_um_VanceWiberg_top (Team 17's 8 bit DAC) tt_um_claudiotalarico_counter (4-bit up/down binary counter) tt_um_gmejiamtz (Configurable Logic Block) tt_um_I2C (I2C and SPI) tt_um_perceptron_mtchun (Perceptron Neuron) tt_um_histogramming (Histogramming) tt_um_gfcwfzkm_scope_bfh_mht1_3 (Basic Oszilloscope and Signal Generator) tt_um_MichaelBell_rle_vga (RLE Video Player) tt_um_ece298a_8_bit_cpu_top (8-Bit CPU) tt_um_Coline3003_top (15 channels emission counter) tt_um_dlmiles_dffram32x8_2r1w (Tiny RAM DFF 2r1w) tt_um_urish_sic1 (SIC-1 8-bit SUBLEQ Single Instruction Computer) tt_um_Coline3003_spect_top (Spectrogram extractor, 2 channels) tt_um_CarrySelect8bit (carry_select) tt_um_koggestone_adder8 (test_friday2) tt_um_Rapoport (Perceptron) tt_um_cellular_alchemist (Hopfield Network with Izhikevich-type RS and FS Neurons) tt_um_tinysynth (Tinysynth) tt_um_wokwi_414120248222232577 (A Tale of Two NCOs) tt_um_a1k0n_nyancat (VGA Nyan Cat) tt_um_tommythorn_workshop (Workshop demo) tt_um_lrc_stevej (LRC - Longitudinal Redundancy Check generator) tt_um_shifter (Shifter) tt_um_schoeberl_test (tinydsp-lol) tt_um_anislam (Leaky integrate and fire spiking neural network) tt_um_systolicLif (Basic model for Systollic array implementation of LIF) tt_um_algofoogle_tt09_ring_osc2 (Verilog ring oscillator V2) tt_um_dff_mem (dff_mem) tt_um_nomuwill (16 Bit Izhikevich Neuron) tt_um_digital_clock_example (7-Segment Digital Desk Clock) tt_um_udxs (Basic Perceptron + ReLU) tt_um_matrix_mult (Basic Matrix-Vector Multiplication) tt_um_db_MAC (8 bit MAC Unit) tt_um_anas_7193 (Programmable PWM Generator) tt_um_flyingfish800 (Verilog test project) tt_um_project_tt09 (Basic LIF Neuron) tt_um_lifn (Integrate-and-Fire Neuron Circuit) tt_um_rejunity_e2m0_x_i8_matmul (E2M0 x INT8 Systolic Array) tt_um_michaelmcculloch_alu (Michaels Tiny Tapeout ALU) tt_um_dog_BILBO (8-bit CBILBO) tt_um_stochastic_integrator_tt9_CL123abc (Stochastic Integrator) tt_um_samkho_two_channel_square_wave_generator (TwoChannelSquareWaveGenerator) tt_um_urish_giant_ringosc (Giant Ring Oscillator (3853 inverters)) tt_um_htfab_caterpillar (Simon's Caterpillar) tt_um_purdue_socet_uart (SoCET UART with FIFO buffers) tt_um_rejunity_sn76489 (Classic 8-bit era Programmable Sound Generator SN76489) tt_um_rejunity_ay8913 (Classic 8-bit era Programmable Sound Generator AY-3-8913) tt_um_tommythorn_cgates (Cgates) tt_um_09eksdee (eksdee) tt_um_rejunity_decoder (ternary, E1M0, E2M0 decoders) tt_um_kailinsley (Dynamic Threshold Leaky Integrate-and-Fire) tt_um_rejunity_vga_test01 (VGA Drop (audio/visual demo)) tt_um_wallento_4bit_toycpu (4-Bit Toy CPU) tt_um_warp (Warp) tt_um_algofoogle_tt09_ring_osc3 (Verilog ring oscillator V3) tt_um_kev_ma_matmult222 (2-bit 2x2 Matrix Multiplier) tt_um_rejunity_vga_logo (VGA Tiny Logo (1 tile)) tt_um_liaf (A simple leaky integrate and fire neuron) tt_um_lif_network_MR (Leaky Neuron Network) tt_um_lsnn_hschweig (Neuromorphic Hardware for SNN LSTM) tt_um_Nishanth_RISCV (RISCV Processor Design) tt_um_KoushikCSN_RISCV (RISCV Processor Design) tt_um_ccu_goatgate (tiny cipher 4 bit key) tt_um_lif_ZB (Tutorial: Simple LIF Neuron) tt_um_z2a_rgb_mixer (RGB Mixer demo) tt_um_vga_clock (VGA clock) tt_um_synth_simple_mm (synth_simple) tt_um_gus16 (GUS16 CPU) tt_um_rejunity_ternary_dot (Ternary 128-element Dot Product) tt_um_virantha_enigma (Enigma - 52-bit Key Length) tt_um_atomNPU (AtomNPU) tt_um_alphaonesoc (AlphaOneSoC) tt_um_gxrii_spi_sevenseg (SPI 7-segment display) tt_um_urish_simon (Simon Says memory game) tt_um_branch_pred (TinyTapeout Minimal Branch Predictor) tt_um_xor_encryption (Xor-Logic) tt_um_MAC_Accelerator_OnSachinSharma (MAC Operation) tt_um_moody_mimosa (Moody-mimosa) tt_um_wrapper (6Digit7SegClock) tt_um_MichaelBell_tinyQV (TinyQV Risc-V SoC) tt_um_devmonk_ay8913 (Classic 8-bit era Programmable Sound Generator AY-3-8913) tt_um_toivoh_demo_tt10 (Orion Iron Ion [TT08 demo competition]) tt_um_2048_vga_game (2048 sliding tile puzzle game (VGA)) tt_um_gamepad_pmod_demo (Gamepad Pmod Demo) tt_um_tinytapeout_logo_screensaver (VGA Screensaver with Tiny Tapeout Logo) tt_um_mattvenn_spi_test (SPI test) tt_um_huffman_coder (Huffmann_Coder) tt_um_multiplier_tt10 (Vedic multiplier) tt_um_schoeberl_wildcat (Wildcat RISC-V) tt_um_kentrane_tinyspectrum (Tiny piano) tt_um_i2c_regf (Asynchronous I2C Registerfile Interface) tt_um_tappu_tobias1012 (Tappu) tt_um_mp_lif_schor (mp_LIF_neuron) tt_um_asgerwenneb (Custom SRAM) tt_um_Strider93 (digital LIF Neuron) tt_um_wokwi_422960078645704705 (Hero on Tape) tt_um_keszocze_ssmcl (SSMCl) tt_um_luke_clock (TT10_Luke_Clock) tt_um_enjens (Verilog based clock to 7-segment counter) tt_um_UartMain (XOR Cipher) tt_um_torurstrom_async_lock (Asynchronous Locking Unit) tt_um_larva (LaRVa CPU) tt_um_zhouzhouthezhou_adder (tt10_zhouzhouthezhou_adder) tt_um_jp_cd101_saw (KCH CD101 Saw Synth) tt_um_hpdl1414_uart_atudoroi (TT10 HPDL 1414 Uart) tt_um_jun1okamura_test0 (7-segment with LFSR) tt_um_strau0106_simple_viii (simple-viii) tt_um_obriensp_jtag (JTAG TAP) tt_um_10_vga_crossyroad (Crossyroad) tt_um_bilal_trng (TRNG) tt_um_space_invaders_game (Space Invaders ASIC) tt_um_sushi_demo (zc-sushi-demo) tt_um_kch_cd101 (kch cd101) tt_um_uart_bgdtanasa (ttUART) tt_um_zedulo_spitest1 (SimpleSPIdev) tt_um_daobaanh_rng (RNG_test) tt_um_gcd_stephan (15bit GCD) tt_um_spacewar (XY Spacewar) tt_um_gregac_tiny_nn (Tiny Neural Network Accelerator) tt_um_log_afpm (16-bit Logarithmic Approximate Floating Point Multiplier) tt_um_rkarl_Spiral (TT_spiralPattern) tt_um_led_jellyant (ledtest) tt_um_project_tt10 (Simple shift Reg) tt_um_DaDDS (DaDDS) tt_um_nithishreddykvs (Pulse Width Modulation) tt_um_monishvr_fifo (Synchronous FIFO) tt_um_reemashivva_fifo (Asynchronous FIFO) tt_um_save_buffer_hash_table (Tiny Hash Table) tt_um_drum_goekce (DRUM) tt_um_rte_sine_synth (Sine Synth) tt_um_tiny_shader_mole99 (Tiny Shader) tt_um_flummer_ltc (Linear Timecode (LTC) generator) tt_um_bitty (Bitty) tt_um_ole_moller_priority_encoder_to_7_segment_decoder (Priority-encoder) tt_um_algofoogle_vga (IHP VGA demo) tt_um_ultra_tiny_cpu (UltraTiny-CPU) tt_um_uwasic_dinogame (UW ASIC - Optimized Dino) tt_um_Qwendu_spi_fpu (SPI FPU) tt_um_aditya_patra (Priority-Encoded Arbiter) tt_um_4_bit_ALU (ALU) tt_um_htfab_checkers (Overengineered Checkers) tt_um_brukstus_tdc_with_spi (TDC with SPI) tt_um_toniklippeo (toni_clk_gen) tt_um_spi_pwm_djuara (spi_pwm) tt_um_iitbbs (CYCLIPSONIC) tt_um_wokwi_411783629732984833 (BINCounterAndGates) tt_um_wokwi_412635532198550529 (tt09-pettit-wokproc-trainer) tt_um_wokwi_413385294512575489 (Duffy) tt_um_wokwi_413387014781302785 (L display) tt_um_wokwi_413387093939376129 (sphereinabox hello) tt_um_wokwi_413387190167208961 (Will It NAND?) tt_um_wokwi_group_1 tt_um_wokwi_group_2 tt_um_wokwi_group_3 tt_um_wokwi_group_4 tt_um_wokwi_group_5 tt_um_wokwi_group_6 tt_um_wokwi_group_7 tt_um_wokwi_group_8 tt_um_wokwi_group_9 tt_um_wokwi_group_10 tt_um_wokwi_group_11 tt_um_wokwi_group_12 tt_um_tetrap_triggerer (triggerer) tt_um_wokwi_group_13 tt_um_multiplier_group_1 tt_um_multiplier_group_2 tt_um_multiplier_group_3