229 Rounding error

229 : Rounding error

Design render
  • Author: Edwin Török
  • Description: Competition entry
  • GitHub repository
  • Clock: 25250000 Hz

How it works

Idea

This started out as an attempt to implement a ray tracer in 2 TT tiles. However, there isn't enough room for a proper one, precision has to be limited, which leads to unavoidable rounding errors.

So embrace rounding errors, and make them the primary feature!

The end result doesn't resemble a 3D scene, or a sphere, or in fact not even a properly rounded circle, but it has rounding errors! And that is the goal of this project now!

HardCaml

The RTL was written using HardCaml, an OCaml DSL that emits Verilog. For convenience the generated Verilog is committed into the source tree, so no additional tools are needed.

I used registers with asynchronous reset, in theory it should be better for an area constrained design.

VGA signal generation

ModeLine

VGA signal timing is described in "3. DMT Video Timing Parameter Definitions" in "VESA Display Monitor Timing Standard Version 1.0, Rev. 13", and is implemented in src/generator/modeline.ml. Examples on how to implement them on an FPGA are available in several places.

The code supports several resolutions, however to conserve area for the demo I've chosen only [email protected], which has negative hsync/vsync polarities. This resolution would need a 25.175 MHz pixel clock, however that can't be produced exactly by the TT08 board, it can only approximate it using a PWM. Therefore, the design is configured to run at the nearest frequency that can be exactly generated: 25.25 MHz, which should be within the 0.5% acceptable by the standard. The ModeLine implemented is: ModeLine "640x480_59.94" 25.175 640 656 752 800 480 490 492 525 -hsync -vsync. (This has 59.94 refresh rate and not 60Hz due to the standard preferring NTSC and its 1.001 adjustment).

The design itself runs off the VGA pixel clock, as I didn't want to deal with potential clock domain crossing issues.

Counters

There are 2 counters: one for H, and one for V synchronization pulses. When the H counter overflows it enables and increments the V counter for 1 cycle. This is implemented in generator/vga.ml, together with waveform expectation tests.

Both H and V counters start out in the visible area for convenience (we can directly use these counters as x/y coordinates, without needing to perform arithmetic in the circuit), then blank the colour signals for the duration of the front porch, synchronization signal and back porch. Although the monitor would recognize the hsync+vsync low as the start of a frame, this is equivalent, but offset by a few clocks.

R, G, B colours

The demo supports 2-bit colours, and as usual these would be sRGB colours, not a linear scale. So we define an internal table indexed by 3 bits representing a linear RGB value, mapping to the sRGB bits.

A register is used for the output, both to avoid logic glitches becoming visible to the monitor, and to provide a reg to reg path that OpenSTA can use to compute setup/hold times.

Generating the colours

When test mode is used (pin ui[0] set to 1) the design outputs vertical colour bars with a white-black-white border. This doesn't have rounding errors, everything is sharp.

In normal mode (pin ui[0] is 0) the "rounding error graphics" is rendered, see below.

Ray marching

For an explanation of how ray marching works, see this ray marching tutorial. The "scene" is represented using signed distance functions. The "eye" Z coordinate is animated between 3.5 and 4.5 in 256 steps, where each frame is one step.

CORDIC

Fixed point arithmetic with 9 bits of precision is used in the HDL, with the exponent tracked by the generator code to reduce register width (though this is not as good as tracking it in hardware, but that'd require more area). Vector normalization is implemented using the CORDIC implementation provided by HardCaml, configured to use 10 bits, and a limited number of iterations (4) to fit into the desired area. This works by rotating the vector until its angle is 0, and then rotating a second unit vector to match the rotation of the original. Or equivalently transform the original from rectangular to polar coordinates, overwrite the length with 1, and convert back from polar to rectangular. CORDIC is defined for 2D in the library, and I define a 3D wrapper based on rectangular to spheric coordinate conversions, although there would be ways to directly compute a 3D version of CORDIC, that is not implemented here.

This is implemented in src/vecmath.

GLSL ES "emulation"

The low level operations are wrapped by a higher level embedded DSL that allows writing code quite similar to GLSL ES, with a very small number of operators: arithmetic (+, -, *, /), comparison (==, <>), abs, min, max, clamp, length, distance, dot, normalize, reflect.

Unfortunately the full renderer didn't fit into 2 tiles, so had to comment out quite a lot of the "GLSL" code (only 1 step of ray marching, no clamping, very simple gradient approximation), what is remaining does not resemble a sphere, or in fact it doesn't even look 3D.

OpenLane configuration

The target density had to be increased to 98% to fit, and the setup slack margin setting had to be increased, see config.json. There are max slew and max fanout violations at 100C and 1.6V, but that shouldn't prevent the design from working at 25C and 1.8V.

The design was simulated using both tt-vgaviz and vgasim, although had to adjust the modeline for vgasim to recognize the standard one. A simple cocotb test which checked vsync/hsync generation was added post submission.

Simulating

There is a src/sim/vgasim.ml, which generates a demo.v compatible with vgasim, this uses a different resolution though. vgasim has to be called with -g 640x480, and videomode.h needs to be edited to use 480 490 492 525 (don't know why it wants 521, that doesn't seem to be the standard timing).

Alternatively the cocotb test in test/ can be run with make -B WAVES=1, and then tt-vgaviz can be used: tt-vgaviz tb.vcd (actually in FST format).

How to test

Configuration

  • Provide a 25.25 MHz clock on the clk pin (RP2040 should be able to provide this with no jitter). Or if you can try 25.175 MHz instead, but this will have some jitter. YMMV.

  • Power the design with at least 1.8V

Main demo

  • Set pin ui[0] to 0 to run the default demo.

  • Reset the design

  • You should see circles moving slowly and large rounding errors:

circles

Test mode

  • Set pin ui[0] to 1 to show a test image with color bars.

  • Reset the design again if desired

  • You should see:

color bars.

External hardware

Connect according to the Demoscene rules

  • VGA output using Leo's VGA PMOD on pins uo[0-7], connected to a monitor supporting 640x480 resolution.

IO

#InputOutputBidirectional
0test mode (0=no, 1=yes)r1
1g1
2b1
3vsync
4r0
5g0
6b0
7hsyncPWM output

Chip location

Controller Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux tt_um_chip_rom (Chip ROM) tt_um_factory_test (TinyTapeout Factory Test) tt_um_MichaelBell_tinyQV (TinyQV Risc-V SoC) tt_um_urish_silife_max (Game of Life 8x32 (siLife)) tt_um_vc32_cpu (VC 16-bit CPU) tt_um_tinytapeout_logo_screensaver (VGA Screensaver with Tiny Tapeout Logo) tt_um_htfab_rotfpga2 (ROTFPGA v2a) tt_um_htfab_latch_test (Latch test) tt_um_no_time_for_squares_tommythorn (No Time For Squares, IHP edition) tt_um_tommythorn_maxbw (Asynchronous Multiplier) tt_um_urish_simon (Simon Says memory game) tt_um_htfab_rotfpga2_ff (ROTFPGA v2b) tt_um_meiniKi_ttihp_fazyrv_exotiny (FazyRV-ExoTiny) tt_um_rebeccargb_hardware_utf8 (Hardware UTF Encoder/Decoder) tt_um_rebeccargb_universal_decoder (Universal Binary to Segment Decoder) tt_um_algofoogle_raybox_zero (raybox-zero TTIHP0p2 edition) tt_um_z2a_rgb_mixer (RGB Mixer demo) tt_um_vga_clock (VGA clock) tt_um_frequency_counter (Frequency counter) tt_um_brandonramos_VGA_Pong_with_NES_Controllers (VGA Pong with NES Controllers) tt_um_demosiine_sda (DemoSiine) tt_um_toivoh_demo_deluxe (Sequential Shadows Deluxe [TT08 demo competition]) tt_um_edwintorok (Rounding error) tt_um_2048_vga_game (2048 sliding tile puzzle game (VGA)) tt_um_hpretl_spi (SPI Test) tt_um_top_mole99 (One Sprite Pony) tt_um_urish_spell (SPELL) tt_um_rebeccargb_vga_pride (VGA Pride) tt_um_autosel (I2C EEPROM Project Selection) tt_um_a1k0n_nyancat (VGA Nyan Cat) tt_um_a1k0n_vgadonut (VGA donut) tt_um_rebeccargb_colorbars (Color Bars) tt_um_crispy_vga (Crispy VGA) tt_um_kbeckmann_flame (Flame demo) tt_um_jamesrosssharp_1bitam (1bit_am_sdr) tt_um_simon_cipher (simon_cipher) tt_um_htfab_bouncy_capsule (Bouncy Capsule) tt_um_phansel_laplace_lut (Experiment Number Six: Laplace LUT) tt_um_kianv_bare_metal (KianV RISC-V RV32E Baremetal SoC) tt_um_calonso88_rsa (8 bit RSA encryption) tt_um_silice (Warp) tt_um_rejunity_vga_test01 (VGA Drop (audio/visual demo)) tt_um_a1k0n_demo (Demo by a1k0n) tt_um_MichaelBell_canon (TT08 Pachelbel's Canon demo) tt_um_htfab_caterpillar (Simon's Caterpillar) tt_um_ravenslofty_chess (Chess) tt_um_fountaincoder_top_V2 (maddihp) tt_um_tomkeddie_a (VGA Experiments in Tennis) tt_um_MichaelBell_mandelbrot (VGA Mandelbrot) tt_um_MichaelBell_rle_vga (RLE Video Player) tt_um_jayjaywong12 (mulmul) tt_um_wokwi_392873974467527681 (PILIPINASLASALLE) tt_um_froith_goldcrest (Goldcrest RISC-V) tt_um_dvxf_dj8v (DJ8 8-bit CPU) tt_um_hpretl_minilogix (Minilogix) tt_um_tomkeddie_b (Transmit UART) tt_um_joerdsonsilva_modem (Multimode Modem) tt_um_oled_frequency_counter (Frequency Counter SSD1306 OLED) tt_um_stochastic_addmultiply_CL123abc (Stochastic Multiplier, Adder and Self-Multiplier) tt_um_QIF_8bit (8 Bit Digital QIF) tt_um_toivoh_retro_console (Retro Console) tt_um_cejmu (CEJMU Beers and Adders) tt_um_rejunity_sn76489 (Classic 8-bit era Programmable Sound Generator SN76489) tt_um_dlmiles_tt05_i2c_bert (I2C BERT) tt_um_dlmiles_muldiv8 (MULDIV unit (8-bit signed/unsigned)) tt_um_dlmiles_loopback (IHP loopback tile with input skew measurement) tt_um_dlmiles_bad_synchronizer (Example of Bad Synchronizer) tt_um_wokwi_407306064811090945 (DDR throughput and flop aperature test) tt_um_urish_giant_ringosc (Giant Ring Oscillator (3853 inverters)) tt_um_digital_clock_example (Digital Desk Clock v2.0) tt_um_rejunity_z80 (Zilog Z80) tt_um_rejunity_ay8913 (Classic 8-bit era Programmable Sound Generator AY-3-8913) tt_um_rtfb_collatz (Collatz conjecture brute-forcer) tt_um_ccattuto_conway (Conway's Game of Life on UART and VGA) tt_um_snow (Snow) tt_um_calonso88_74181 (8-bit ALU based on 2x 74181) tt_um_rejunity_vga_logo (VGA Tiny Logo (1 tile)) tt_um_NicklausThompson_SkyKing (SkyKing Demo) tt_um_htfab_cells (Cell mux) tt_um_htfab_pg_1x1 (Power gating test (1x1)) tt_um_htfab_pg_1x2 (Power gating test (1x2)) tt_um_dlmiles_ringosc_5inv (Ring Oscillator (5 inverter)) tt_um_devinatkin_pulse_width_counter (Pulse Width Counter) tt_um_algofoogle_vga_fun_wrapper (TTIHP VGA FUN!) tt_um_cfib_demo (cfib Demoscene Entry) tt_um_vga_glyph_mode (Glyph Mode) tt_um_favoritohjs_scroller (VGA Scroller) tt_um_pulse_generator (TTL Pulse Generator) tt_um_rajum_iterativeMAC (Iterative MAC) tt_um_algofoogle_tinyvga_fun_wrapper (TTIHP TinyVGA FUN!) tt_um_urish_sram_test (SRAM (1024x8) test) tt_um_one_bit_puf_wrapper (One Bit PUF) tt_um_multi_bit_puf_wrapper (One Bit PUF) tt_um_gray_sobel (Gray scale and Sobel filter) tt_um_rebeccargb_intercal_alu (INTERCAL ALU)