910 Tiniest GPU

910 : Tiniest GPU

Design render
  • Author: Matt Pongsagon
  • Description: A GPU that can render only a triangle
  • GitHub repository
  • Clock: 50000000 Hz

What is it

  • This is a tiniest ASIC GPU. It can render a quad using two triangles with texture mapped.
  • The chip comes with two texture ROM images. (My schools' logo)
  • The transformation, lighting and rasterization are done in the GPU.
  • It support solid shading with one directional light source and affine texture mapping.
  • All 3D data (coordinates, transformation, render mode) are sent from the PC each frame via a COM port.
  • The output is sent to the VGA monitor using TinyVGA. The output resolution is 640x480 pixels, 6-bit RGB.
  • The clock fequency is 50 Mhz.

Folders

https://github.com/pongsagon/tt07-tiniest-gpu

  1. src: ASIC Verilog version
  2. Basys3: Verilog version targeted Basys3 FPGA board
  3. Verilator_sim: Verilog simulation version using Verilator and SDL
  4. test_software: PC app used to sending data to the GPU

How to use

Plugin a TinyVGA PMOD, connect at the port uio.
Send UART command to control the GPU via serial console, ui_in[3] - RX

Please go to https://github.com/pongsagon/tt07-tiniest-gpu/tree/main/test_software to get the testing app.
ASIC GPU testing app written in C run on Windows.
To be able to run on different OS, you may need to use different UART library.
The app will send these data, 60 bytes, each frame @11520 baud rate to the GPU

  • 4 vertices world coordinate that form a quad
  • 1 normalize normal
  • 1 normalize light direction
  • 3x4 ModelViewProjection matrix. (third row is not used)
  • 1 byte render mode
    • solid shading, texture, alpha masked

The data are in the format of fixed point Q8.8 except for the 1-byte render mode.
The code has been successfully tested with the Basys3 board, sending data at 60fps.

Run/Edit the code

  1. The code rely on SFML library for the input and windows. https://www.sfml-dev.org/tutorials/2.6/start-vc.php.
    Please install SFML first.
  2. Change the COM port number in C code to match with the ASIC/FPGA port (main_serial.cpp, line number 330)
  3. short cut keys
    • arrow key: yaw pitch
    • as: zoom
    • df: change model size
    • er: X translation
    • 012: render mode
    • 34: change texture
    • 6789: change triangle 1 color
    • uiop: change triangle 2 color
  4. You can also changes the vertices coordinate, quad normal and light direction using code. I have not write short cut keys for setting them.

How it work

  • Fixed point
    • All of the calculation are done in fixed point.
    • The format of the fixed point is depend on the type of variables to save register space as much as possible. Thus, they are a lot of bit operation in the code to transiton between fixed point formats.
  • Modules
    • ia.v (input assembly)
      • manage reading data (60 bytes each frame) from UART and save to the registers
    • vs.v (vertex stage)
      • transform vertices from world space to screen space
      • compute lighting intensity color for each triangle
      • compute triangles' edge parameters and barycentric coordinates
    • raster.v
      • rasterization two triangles, interpolate color, texture mapping
  • No framebuffer or linebuffer
    • Each pixel color has to be computed in 2 clock cycles.
    • the rasterization is running in parallel with the vertex stage.
    • Using incremental edge function to do pixel-triangle inside test.
  • Computation steps
    1. read data from the PC via UART (in project.v, ia.v)
    2. for each frame (in vs.v)
      • transform vertices to screen space and compute lighting
      • (done during VBlank) compute triangles' edge parameters and barycentric coordinates
      • all of these calculation are done in 82 states and use around 2,000 clock cycles.
    3. for each scanline (in vs.v, raster.v)
      • done in 1 clock cycle
      • increment edge functions parameters of each scanline
      • increment barycentric parameters of each scanline
    4. for each pixel (in raster.v)
      • 1st clock cycle:
        • check pixel inside/outside of which triangles
        • compute interpolated (u,v) of this pixel, get texel color from this (u,v)
      • 2nd clock cycle:
        • color the pixel using texel color or light intensity color
        • actually the texture ROM is monochorme, the color is hardcoded using (u,v) coordinate.

IO

#InputOutputBidirectional
0R1
1G1
2B1
3RXvsync
4R0
5G0
6B0
7hsync

Chip location

Controller Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Analog Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Analog Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux tt_um_chip_rom (Chip ROM) tt_um_factory_test (TinyTapeout 7 Factory Test) tt_um_analog_factory_test (TT07 Analog Factory Test) tt_um_urish_charge_pump (Dickson Charge Pump) tt_um_adennen_inverter (Aron's analog buffer test) tt_um_rejunity_z80 (Zilog Z80) tt_um_kianv_bare_metal (KianV RISC-V RV32E Baremetal SoC) tt_um_macros77_subneg (SUBNEG CPU) tt_um_eater_8bit (Tiny Eater 8 Bit) tt_um_ender_clock (clock) tt_um_wokwi_397140982440144897 (7-Seg 'Tiny Tapeout' Display) tt_um_wokwi_397142450561071105 (Padlock) tt_um_Burrows_Katie (QIF Neuron) tt_um_vga_clock (VGA clock) tt_um_aidenfoxivey (CRC-8 CCITT) tt_um_PUF (Reversible logic based Ring-Oscillator Physically Unclonable Function (RO-PUF)) tt_um_devinatkin_dual_oscillator (dual oscillator) tt_um_urish_simon (Simon Says memory game) tt_um_ajstein_stopwatch (Stopwatch Project) tt_um_rnunes2311_12bit_sar_adc (12 bit SAR ADC) tt_um_DanielZhu123 (calculator) tt_um_wokwi_397268065185737729 (Mini Light Up Game) tt_um_toivoh_basilisc_2816 (Basilisc-2816) tt_um_MichaelBell_rle_vga (RLE Video Player) tt_um_wokwi_397774697322214401 (secret L) tt_um_ccattuto_charmatrix (Serial Character LED Matrix) tt_um_The_Chairman_send_receive (Send Receive) tt_um_mini_aie_2x2 (mini-aie-cgra) tt_um_twin_tee_opamp_osc (Twin Tee Sine Wave Generator) tt_um_brucemack_sb_mixer (Single Balanced Mixer) tt_um_revenantx86_tinytpu (TinyTPU) tt_um_chess (Chess) tt_um_vga_perlin (VGA Perlin Noise) tt_um_calonso88_74181 (ALU 74181) tt_um_tinytapeout_dvd_screensaver (DVD Screensaver with Tiny Tapeout Logo (Tiny VGA)) tt_um_TD4_Assy_KosugiSubaru (4bit_CPU_td4) tt_um_drburke3_top (FastMagnitudeComparator) tt_um_pongsagon_tiniest_gpu (Tiniest GPU) tt_um_jorga20j_prng (8 bit PRNG) tt_um_ejfogleman_smsdac8 (8-bit DEM R2R DAC) tt_um_ccattuto_conway (Conway's Terminal) tt_um_fp_mac (FP-8 MAC Module) tt_um_router (router) tt_um_serdes (SerDes) tt_um_rejunity_analog_dac_ay8913 (AY-8193 single channel DAC) tt_um_riscv_spi_wrapper (RISCV32I with spi wrapper) tt_um_mos_bandgap (MOS Bandgap) tt_um_shadow1229_vga_player (VGA player) tt_um_explorer (Explorer) tt_um_rtmc_top_jrpetrus (Real Time Motor Controller) tt_um_28add11_QOAdecode (QOA Decoder) tt_um_toivoh_basilisc_2816_cpu_OL2 (Basilisc-2816) tt_um_afasolino (integer to posit converter and adder ) tt_um_8bit_vector_compute_in_SRAM (8-bit Vector Compute-in-SRAM) tt_um_lfsr (LFSR) tt_um_tnt_diff_rx (TT07 Differential Receiver test) tt_um_urish_spell (SPELL) tt_um_underserved (underserved) tt_um_dpetrisko_ttdll (TTDLL) tt_um_mitssdd (co processor for precision farming) tt_um_wokwi_399192124046955521 (ECC_test1) tt_um_dusterthefirst_project (Communicate 433) tt_um_xeniarose_sha256 (tiny sha256) tt_um_njp_micro (MicroCode Multiplier) tt_um_VishalBingi_r2r_4b (4-bit R2R DAC) tt_um_lisa (LISA Microcontroller with TTLC) tt_um_template (TT7 Simple Clock) tt_um_seanyen0_SIMON (SIMON) tt_um_agurrier_mastermind (Mastermind) tt_um_KolosKoblasz_mixer (Gilbert Mixer) tt_um_Saitama225_comp (Analog comparator) tt_um_tt7_meonwara (TBD) tt_um_multiplier_mbm (Modified Booth Multiplier) tt_um_delay_line_tmng (Delay Line Time Multiplexed NAND Gate) tt_um_mandelbrot_accel (Mandelbrot Set Accelerator (32-bit IEEE 754)) tt_um_dvxf_dj8v_dac (DJ8 8-bit CPU w/ DAC) tt_um_obriensp_pll (PLL Playground) tt_um_unisnano (unisnano) tt_um_alfiero88_CurrentTrigger (Current Mode Trigger) tt_um_CktA_InstAmp (Instrumentation Amplifier for Electrocardiogram Signal Adquisition) tt_um_lcasimon_tdc (Analog TDC) tt_um_neural_network (Neural Network dinamic) tt_um_PS_PWM (Phase Shifted PWM Modulator) tt_um_litneet64_ro_puf (RO-based Physically Unclonable Function (PUF)) tt_um_wokwi_399163158804194305 (Digital Timer) tt_um_algofoogle_raybox_zero (raybox-zero TT07 edition) tt_um_wokwi_399336892246401025 (UART) tt_um_wokwi_399169514887574529 (Gaussian Blur) tt_um_maheredia (GPS signal generator) tt_um_pwm_elded (UACJ_PWM) tt_um_wokwi_399447152724198401 (8-Bit Register) tt_um_adonairc_dda (DDA solver for van der Pol oscillator) tt_um_6bitaddr (6 bit addr) tt_um_btflv_subleq (Subleq CPU with FRAM and UART) tt_um_emern_top (badGPU) tt_um_wokwi_399469995038350337 (dEFAULt 2hAC) tt_um_mixed_signal_pulse_gen (mixed_signal_pulse_gen) tt_um_maxluppe_digital_analog (All Digital DAC and Analog Comparators) tt_um_pyamnihc_dummy_counter (Dummy Counter) tt_um_wokwi_399488550855755777 (My 9-year-old son made an 8-bit counter chip) tt_um_toivoh_basilisc_2816_cpu_exp (Basilisc-2816 Experimental) tt_um_htfab_fprn (Field Programmable Resistor Network) tt_um_rejunity_ay8913 (Classic 8-bit era Programmable Sound Generator AY-3-8913) tt_um_maxluppe_NIST (Four NIST SP 800-22 tests implementation) tt_um_vga_snake (VGA Snake Game) tt_um_cm_1 (GDS counter-measures experiment 1) tt_um_nurirfansyah_alits02 (Analog Test Circuit ITS 2) tt_um_analog_rf_readout_circuit (RF_peripheral_circuits) tt_um_jleightcap (fractran-tt) tt_um_wokwi_399518371950068737 (Full-adder out of a kmap) tt_um_davidparent_hdl (PRBS Generator) tt_um_adia_psu_seq_test (Adiabatic PSU sequencer test) tt_um_spacecat_chan_john_pong_the_second (John Pong The Second) tt_um_rajum_iterativeMAC (Iterative MAC) tt_um_asinghani_tinywspr (TinyWSPR) tt_um_thatoddmailbox (DuckCPU) tt_um_rejunity_vga (VGA Checkers) tt_um_8bitadder (Ripple Carry Adder 8 bit) tt_um_vzayakov_top (Pong-VGA) tt_um_pa1mantri_cdc_fifo (Clock Domain Crossing FIFO) Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available