489 MULDIV unit (8-bit signed/unsigned)

489 : MULDIV unit (8-bit signed/unsigned)

Design render
  • Author: Darryl Miles
  • Description: Combinational Multiply and Divide Unit (signed and unsigned)
  • GitHub repository
  • Clock: 0 Hz

Background

Combinational multiply / divider unit (8bit+8bit input)

This is an updated version of the original project that was submitted and manufactured in TT04 (https://github.com/dlmiles/tt04-muldiv4). The previous project was hand crafted in Logisim-Evolution then exported as verilog and integrated into a TT04 project.

This version is the same design, extended to 8-bit wide inputs, but instead of hand crafting the logic gates in a GUI we convert functional blocks into SpinalHDL language constructs. Part of the purpose of this design is to understand the area and timing changes introduced by adding more bits, then to explore alternative topologies.

The goal of the next iteration of this design maybe to introduce a FMA (Fused Multiply Add/Accumulate) function and ALU function to explore if there is some useful composition of these functions (that might be useful in an 8bit CPU/MCU design, or scale to something bigger). The next iteration on from this could explore how to draw the transistors directly (instead of using standard cell library) for such an arrangement, this may result in non-rectangular cells that interlock to improve both area density and timing performance. Or it might go up in smoke... who knows.

How It Works

Due to the limited total IOs available at the external TT interface it is necessary to clock the project and setup UI_IN[0] to load each of the 2 8-bit input registers.

The input side uses latches to capture, which means during the appropiate phase CLK (high) and ADDR state, it alternatively opens/closes, the data is becomes captured into the latches at the CLK NEGEDGE. During the whole time it is open and closed it is providing the data into the appropiate input side of both MUL and DIV units (which are seperate logic modules).

The result becomes immediately available (after propagation and ripple settling time) at the outputs. While the latch it open, maybe artificially by extending duty-cycle of CLK, you should also be able to conduct experiments on modifying input and observing output (when in immediate result mode)

The result output is also multiplexed and has an immediate and registered mode. The immediate mode provides a direct visibility of the MUL/DIV combintational output and should allos timing between input and outputs to be observed. (you need to account for address multiplex of high-low 8bit sides of result). The registered mode capture the result in full at the time of the last ADDR and a CLK posedge. This allows you to change the values for the input side during the next few cycles, while the module ensures to sustain the result value of the last computation at the output. With an appropiate pipeline interleave request and result information to achieve higher throughput.


FIXME

FIXME please check out the original github for any enhanced documentation for this project, potentially improved information nearer PCB+IC delivery (to customer) schedule but also post-production post-physically testing results and information. I hope to produce some kind graphs showing the timing capture and reliability to show and demonstrate the cascade effect. This assumes I have the design correct to allow this to happen, but there are some tricks (like extending CLK on-duty cycle when latches are open) enough to see result capture output.

FIXME provide wavedrom diagram (MULU, MULS, DIVU, DIVS)

FIXME explain IMMediate mode and REGistered mode (to pipeline)

FIXME provide blockdiagram of functional units
//    D
//   MUX
//   X Y registers (loaded from multiplexed D)
//    OP -> res flags
//   P P registers
//  DEMUX
//    R

FIXME explain architective difference to previous example and considerations why to change.

FIXME explain addressing mode to allow much wider units and potentially uneven input sizes.


Multiplier (signed/unsigned) Method uses Ripple Carry Array as 'high speed multiplier' Setup operation mode bits MULDIV=0 and OPSIGNED(unsigned=0/signed=1) Setup A (multiplier 8-bit) * B (multiplicand 8-bit) Expect result P (product 16-bit)

Divider (signed/unsigned) Method uses Full Adder with Mux as 'combinational restoring array divider algorithm'. Setup operation mode bits MULDIV=1 and OPSIGNED(unsigned=0/signed=1) Setup Dend (dividend 8-bit) / Dsor (divisor 8-bit) Expect result Q (quotient 8-bit) with R (remainder 8-bit)

Divider has error bit indicators that take precedence over any result. If any error bit is set then the output Q and R should be disregarded. When in multiplier mode error bits are muted to 0. No input values can cause an overflow error so the bit is always reset.

How to test

Please check back with the project github main page and the published docs/ directory. There is expected to be some instructions provided around the time the TT05 chips a received (Q4 2024).

At the time of writing receiving a physical chip (from a previous TT edition) back has not occured, so there is no experience on the best way to test this project, so I defer the task of writing this section to a later time.

There should be sufficient instructions here start you own journey.

External hardware

It is expect the RP2040 and a Python REPL should be sufficient test this project.

Thoughts to the future (next iteration)

uio_in[3] might moved to bit4 and DIV0/OVER combined into bit5 This would allow the address the contigious area below. However during a test build of a MULDIV16 version it easily exceeds 1x1, as this stage looking towards making builds with permutations of design/topology and method to generate GDS. So 1x1 is good to achieve this.

The uio_in[3] feature wants to use registered mode to lock result when last address is clocked in this way we can pipeline result and demonstration of what pipelining can do to increase thoughput.

The TB is limited to the 4bit version. Ran out of time to validate registered output and pipeline.

Encapsulate the SpinalHDL Scala netlist generation, and write a yosys JVM module harness (a yosys C++ module that is a JVM thread/process runner, with communication interface, data/ffi API/lifecycle). Then write a yosys plugin that allows it to directly include, use and call for generated data based on parametric details.

Consider emitting a custom cell/macro/GDS_object that yosys can call for, then emit verilog like a regular standard cell module.

Consider modifying OpenROAD/OpenLane to incorporate generated macros directly into other detailed routing environment then have the existing detailed routing work around it as-is.

TODO

Fixup the original logisim schematic labels.

The input re-ordering (which made the SpinalHDL algo easier)

Relabel the P6_EXTND_EN to P7_EXTND_EN the original product index label was a bad choice in retrospect.

Provide the SpinalHDL directory to the project with the sbt project and netlist generation code.

Fill out SpinalHDL unit testing testing.

Test support for SUPPORT_SIGNED=false (try to completely remove nets from output instead of assigning constant False and letting synthesis optimize away)

Implement support for seperate SUPPORT_SIGNED for each input with 3 modes of operation ALWAYS/NEVER/BOTH(like now using control input bit)

Implement and test support for odd-sized inputs, so the width of X and Y or DEND and DSOR can be different sizes.

When input width can be unequal, test out the EOVERFLOW in the divider is wired to the correct port and works in this scenarios.

Provide unit testing for common multipler sizes, obvious byte boundaries but also the sizes common in FPGA DSP primitives.

IO

#InputOutputBidirectional
0Data0 see docsResult0 see docsAddr bit0 HI=1/lo=0 mux of Data and Result (input only)
1Data1 see docsResult1 see docs
2Data2 see docsResult2 see docs
3Data3 see docsResult3 see docsResult mux registered=1/immediate=0 (input only)
4Data4 see docsResult4 see docsDIV error overflow (output only)
5Data5 see docsResult5 see docsDIV error divide-by-zero (output only)
6Data6 see docsResult6 see docsOPSIGNED mode (input only)
7Data7 see docsResult7 see docsMULDIV mode (input only)

Chip location

Controller Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux tt_um_chip_rom (Chip ROM) tt_um_factory_test (TinyTapeout Factory Test) tt_um_MichaelBell_tinyQV (TinyQV Risc-V SoC) tt_um_urish_silife_max (Game of Life 8x32 (siLife)) tt_um_vc32_cpu (VC 16-bit CPU) tt_um_tinytapeout_logo_screensaver (VGA Screensaver with Tiny Tapeout Logo) tt_um_htfab_rotfpga2 (ROTFPGA v2a) tt_um_htfab_latch_test (Latch test) tt_um_no_time_for_squares_tommythorn (No Time For Squares, IHP edition) tt_um_tommythorn_maxbw (Asynchronous Multiplier) tt_um_urish_simon (Simon Says memory game) tt_um_htfab_rotfpga2_ff (ROTFPGA v2b) tt_um_meiniKi_ttihp_fazyrv_exotiny (FazyRV-ExoTiny) tt_um_rebeccargb_hardware_utf8 (Hardware UTF Encoder/Decoder) tt_um_rebeccargb_universal_decoder (Universal Binary to Segment Decoder) tt_um_algofoogle_raybox_zero (raybox-zero TTIHP0p2 edition) tt_um_z2a_rgb_mixer (RGB Mixer demo) tt_um_vga_clock (VGA clock) tt_um_frequency_counter (Frequency counter) tt_um_brandonramos_VGA_Pong_with_NES_Controllers (VGA Pong with NES Controllers) tt_um_demosiine_sda (DemoSiine) tt_um_toivoh_demo_deluxe (Sequential Shadows Deluxe [TT08 demo competition]) tt_um_edwintorok (Rounding error) tt_um_2048_vga_game (2048 sliding tile puzzle game (VGA)) tt_um_hpretl_spi (SPI Test) tt_um_top_mole99 (One Sprite Pony) tt_um_urish_spell (SPELL) tt_um_rebeccargb_vga_pride (VGA Pride) tt_um_autosel (I2C EEPROM Project Selection) tt_um_a1k0n_nyancat (VGA Nyan Cat) tt_um_a1k0n_vgadonut (VGA donut) tt_um_rebeccargb_colorbars (Color Bars) tt_um_crispy_vga (Crispy VGA) tt_um_kbeckmann_flame (Flame demo) tt_um_jamesrosssharp_1bitam (1bit_am_sdr) tt_um_simon_cipher (simon_cipher) tt_um_htfab_bouncy_capsule (Bouncy Capsule) tt_um_phansel_laplace_lut (Experiment Number Six: Laplace LUT) tt_um_kianv_bare_metal (KianV RISC-V RV32E Baremetal SoC) tt_um_calonso88_rsa (8 bit RSA encryption) tt_um_silice (Warp) tt_um_rejunity_vga_test01 (VGA Drop (audio/visual demo)) tt_um_a1k0n_demo (Demo by a1k0n) tt_um_MichaelBell_canon (TT08 Pachelbel's Canon demo) tt_um_htfab_caterpillar (Simon's Caterpillar) tt_um_ravenslofty_chess (Chess) tt_um_fountaincoder_top_V2 (maddihp) tt_um_tomkeddie_a (VGA Experiments in Tennis) tt_um_MichaelBell_mandelbrot (VGA Mandelbrot) tt_um_MichaelBell_rle_vga (RLE Video Player) tt_um_jayjaywong12 (mulmul) tt_um_wokwi_392873974467527681 (PILIPINASLASALLE) tt_um_froith_goldcrest (Goldcrest RISC-V) tt_um_dvxf_dj8v (DJ8 8-bit CPU) tt_um_hpretl_minilogix (Minilogix) tt_um_tomkeddie_b (Transmit UART) tt_um_joerdsonsilva_modem (Multimode Modem) tt_um_oled_frequency_counter (Frequency Counter SSD1306 OLED) tt_um_stochastic_addmultiply_CL123abc (Stochastic Multiplier, Adder and Self-Multiplier) tt_um_QIF_8bit (8 Bit Digital QIF) tt_um_toivoh_retro_console (Retro Console) tt_um_cejmu (CEJMU Beers and Adders) tt_um_rejunity_sn76489 (Classic 8-bit era Programmable Sound Generator SN76489) tt_um_dlmiles_tt05_i2c_bert (I2C BERT) tt_um_dlmiles_muldiv8 (MULDIV unit (8-bit signed/unsigned)) tt_um_dlmiles_loopback (IHP loopback tile with input skew measurement) tt_um_dlmiles_bad_synchronizer (Example of Bad Synchronizer) tt_um_wokwi_407306064811090945 (DDR throughput and flop aperature test) tt_um_urish_giant_ringosc (Giant Ring Oscillator (3853 inverters)) tt_um_digital_clock_example (Digital Desk Clock v2.0) tt_um_rejunity_z80 (Zilog Z80) tt_um_rejunity_ay8913 (Classic 8-bit era Programmable Sound Generator AY-3-8913) tt_um_rtfb_collatz (Collatz conjecture brute-forcer) tt_um_ccattuto_conway (Conway's Game of Life on UART and VGA) tt_um_snow (Snow) tt_um_calonso88_74181 (8-bit ALU based on 2x 74181) tt_um_rejunity_vga_logo (VGA Tiny Logo (1 tile)) tt_um_NicklausThompson_SkyKing (SkyKing Demo) tt_um_htfab_cells (Cell mux) tt_um_htfab_pg_1x1 (Power gating test (1x1)) tt_um_htfab_pg_1x2 (Power gating test (1x2)) tt_um_dlmiles_ringosc_5inv (Ring Oscillator (5 inverter)) tt_um_devinatkin_pulse_width_counter (Pulse Width Counter) tt_um_algofoogle_vga_fun_wrapper (TTIHP VGA FUN!) tt_um_cfib_demo (cfib Demoscene Entry) tt_um_vga_glyph_mode (Glyph Mode) tt_um_favoritohjs_scroller (VGA Scroller) tt_um_pulse_generator (TTL Pulse Generator) tt_um_rajum_iterativeMAC (Iterative MAC) tt_um_algofoogle_tinyvga_fun_wrapper (TTIHP TinyVGA FUN!) tt_um_urish_sram_test (SRAM (1024x8) test) tt_um_one_bit_puf_wrapper (One Bit PUF) tt_um_multi_bit_puf_wrapper (One Bit PUF) tt_um_gray_sobel (Gray scale and Sobel filter) tt_um_rebeccargb_intercal_alu (INTERCAL ALU)