616 MULDIV unit (8-bit signed/unsigned) with sky130 HA/FA cells

616 : MULDIV unit (8-bit signed/unsigned) with sky130 HA/FA cells

Design render
  • Author: Darryl Miles
  • Description: Combinational Multiply and Divide Unit (signed and unsigned)
  • GitHub repository
  • Clock: 0 Hz

Background

Combinational multiply / divider unit (8bit+8bit input)

This is an updated version of the original project that was submitted and manufactured in TT04 https://github.com/dlmiles/tt04-muldiv4. The previous project was hand crafted in Logisim-Evolution then exported as verilog and integrated into a TT04 project.

This version is the same design, extended to 8-bit wide inputs, but instead of hand crafting the logic gates in a GUI we convert functional blocks into SpinalHDL language constructs. Part of the purpose of this design is to understand the area and timing changes introduced by adding more bits, then to explore alternative topologies.

The goal of the next iteration of this design maybe to introduce a FMA (Fused Multiply Add/Accumulate) function and ALU function to explore if there is some useful composition of these functions (that might be useful in an 8bit CPU/MCU design, or scale to something bigger). The next iteration on from this could explore how to draw the transistors directly (instead of using standard cell library) for such an arrangement, this may result in non-rectangular cells that interlock to improve both area density and timing performance. Or it might go up in smoke... who knows.

How It Works

Due to the limited total IOs available at the external TT interface it is necessary to clock the project and setup UI_IN[0] to load each of the 2 8-bit input registers.

The data is latched at the CLK NEGEDGE and the value provided to the combinational logic MUL/DIV operations (which are seperate logic modules) with the answer becoming immediately available (after propagation and ripple settling time) at the outputs.

The result output is also multiplexed and has an immediate and register mode. The immediate mode provides a direct visibility of the MUL/DIV combintational timing between input and outputs (you need to account for address multiplex of high-low 8bit sides of result). The registered mode capture the result in full so that it is possible to pipeline interleave request and result information to achieve higher throughput.

So one half of the answer is immediately available to read and the other half of the answer can be read by toggling UI_IN[0] (address bit0). Clocking is needed for registered output mode, but not necessarily for immediate mode.

// FIXME please check out the original githun for any enahcnaed // documentation for this project, potentially improved information // nearer PCB+IC delivery (to customer) schedule but also post-production // post-physically testing results and information. // I hope to produce some kind graphs showing the timing capture and // reliability to show and demonstrate the cascade effect. This assume // I have the design correct to allow this to happen, but there are some // tricked (like extending CLK on-duty cycle when latches are open) enough // to see result capture output.

// FIXME provide wavedrom diagram (MULU, MULS, DIVU, DIVS)

// FIXME explain IMMediate mode and REGistered mode (to pipeline)

// FIXME provide blockdiagram of functional units // D // MUX // X Y registers (loaded from multiplexed D) // OP -> res flags // P P registers // DEMUX // R

// FIXME explain architective difference to previous example and // considerations why to change.

// FIXME explain addressing mode to allow much wider units and // potentially uneven input sizes.

Multiplier (signed/unsigned) Method uses Ripple Carry Array as 'high speed multiplier' Setup operation mode bits MULDIV=0 and OPSIGNED(unsigned=0/signed=1) Setup A (multiplier 8-bit) * B (multiplicand 8-bit) Expect result P (product 16-bit)

Divider (signed/unsigned) Method uses Full Adder with Mux as 'combinational restoring array divider algorithm'. Setup operation mode bits MULDIV=1 and OPSIGNED(unsigned=0/signed=1) Setup Dend (dividend 8-bit) / Dsor (divisor 8-bit) Expect result Q (quotient 8-bit) with R (remainder 8-bit)

Divider has error bit indicators that take precedence over any result. If any error bit is set then the output Q and R should be disregarded. When in multiplier mode error bits are muted to 0. No input values can cause an overflow error so the bit is always reset.

How to test

Please check back with the project github main page and the published docs/ directory. There is expected to be some instructions provided around the time the TT05 chips a received (Q4 2024).

At the time of writing receiving a physical chip (from a previous TT edition) back has not occured, so there is no experience on the best way to test this project, so I defer the task of writing this section to a later time.

There should be sufficient instructions here start you own journey.

External hardware

It is expect the RP2040 and a Python REPL should be sufficient test this project.

Thoughts to the future (next iteration)

uio_in[3] might moved to bit4 and DIV0/OVER combined into bit5 This would allow the address the contigious area below. However during a test build of a MULDIV16 version it easily exceeds 1x1, as this stage looking towards making builds with permutations of design/topology and method to generate GDS. So 1x1 is good to achieve this.

The uio_in[3] feature wants to use registered mode to lock result when last address is clocked in this way we can pipeline result and demonstration of what pipelining can do to increase thoughput.

The TB is limited to the 4bit version. Ran out of time to validate registered output and pipeline.

Encapsulate the SpinalHDL Scala netlist generation, and write a yosys JVM module harness (a yosys C++ module that is a JVM thread/process runner, with communication interface, data/ffi API/lifecycle). Then write a yosys plugin that allows it to directly include, use and call for generated data based on parametric details.

Consider emitting a custom cell/macro/GDS_object that yosys can call for, then emit verilog like a regular standard cell module.

Consider modifying OpenROAD/OpenLane to incorporate generated macros directly into other detailed routing environment then have the existing detailed routing work around it as-is.

TODO

Fixup the original logicsim schematic labels.

The input re-ordering (which made the SpinalHDL algo easier)

Relabel the P6_EXTND_EN to P7_EXTND_EN the original prodict index label was a bad choice in retrospect.

Provide the SpinalHDL directory to the project with the sbt project and netlist generation code.

Fill out SpinalHDL unit testing testing.

Test support for SUPPORT_SIGNED=false (try to completely remove nets from output instead of assigning constant False and letting synthesis optimize away)

Implement support for seperate SUPPORT_SIGNED for each input with 3 modes of operation ALWAYS/NEVER/BOTH(like now using control input bit)

Implement and test support for odd-sized inputs, so the width of X and Y or DEND and DSOR can be different sizes.

When input width can be unequal, test out the EOVERFLOW in the divider is wired to the correct port and works in this scenarios.

Provide unit testing for commong multipler sizes, obvious byte boudnaries but also the sizes common in FPGA DSP primitives.

IO

#InputOutputBidirectional
0Data0 see docsResult0 see docsAddr bit0 HI=1/lo=0 mux of Data and Result (input only)
1Data1 see docsResult1 see docsunused
2Data2 see docsResult2 see docsunused
3Data3 see docsResult3 see docsResult mux registered=1/immediate=0 (input only)
4Data4 see docsResult4 see docsDIV error overflow (output only)
5Data5 see docsResult5 see docsDIV error divide-by-zero (output only)
6Data6 see docsResult6 see docsOPSIGNED mode (input only)
7Data7 see docsResult7 see docsMULDIV mode (input only)

Chip location

Controller Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Analog Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Analog Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux tt_um_chip_rom (Chip ROM) tt_um_factory_test (TinyTapeout 06 Factory Test) tt_um_analog_factory_test (TT06 Analog Factory Test) tt_um_analog_factory_test (TT06 Analog Factory Test) tt_um_urish_charge_pump (Dickson Charge Pump) tt_um_psychogenic_wowa (WoWA) tt_um_oscillating_bones (Oscillating Bones) tt_um_kevinwguan (Crossbar Array) tt_um_coloquinte_moosic (Moosic logic-locked design) tt_um_alexsegura_pong (Pong) tt_um_iron_violet_simon (Iron Violet) tt_um_tomkeddie_a (VGA Experiments in Tennis) tt_um_MichaelBell_tinyQV (TinyQV Risc-V SoC) tt_um_andychip1_sn74169 (sn74169) tt_um_mattvenn_r2r_dac (Analog 8bit R2R DAC) tt_um_thorkn_audiochip_v2 (AudioChip_V2) tt_um_faramire_gate_guesser (Gate Guesser) tt_um_urish_simon (Simon Says game) tt_um_TT06_SAR_wulffern (TT06 8-bit SAR ADC) tt_um_soundgen (soundgen) tt_um_ledcontroller_Gatsch (ledcontroller) tt_um_digitaler_filter_rathmayr (Digitaler Filter) tt_um_histefan_top (Snake Game) tt_um_mayrmichael_wave_generator (Wave Generator) tt_um_advanced_counter (jku-tt06-advanced-counter) tt_um_FanCTRL_DomnikBrandstetter (PI-Based Fan Controller) tt_um_ps2_morse_encoder_top (PS/2 Keyboard to Morse Code Encoder) tt_um_calculator_muehlbb (16-bit calculator) tt_um_hpretl_tt06_tempsens (Temperature Sensor NG) tt_um_haeuslermarkus_fir_filter (FIR Filter with adaptable coefficients) tt_um_mattvenn_rgb_mixer (RGB Mixer demo) tt_um_analog_loopback (Analog loopback) tt_um_entwurf_integrierter_schaltungen_hadner (Projekt KEIS Hadner Thomas) tt_um_seven_segment_fun1 (7-segment-FUN) tt_um_moving_average_master (Moving average filter) tt_um_rgbled_decoder (SPI to RGBLED Decoder/Driver) tt_um_4bit_cpu_with_fsm (4-Bit CPU mit FSM) tt_um_flappy_bird (Flappy Bird) tt_um_drops (drops) tt_um_enieman (UART-Programmable RISC-V 32I Core) tt_um_gabejessil_timer (2 Player Game) tt_um_wokwi_384804985843168257 (playwithnumbers) tt_um_wokwi_384711264596377601 (luckyCube) tt_um_hpretl_tt06_tdc (Synthesized Time-to-Digital Converter (TDC)) tt_um_wokwi_384437973887503361 (Asynchronous Down Counter) tt_um_spi_pwm_djuara (spi_pwm) tt_um_SteffenReith_PiMACTop (PiMAC) tt_um_mattvenn_relax_osc (Relaxation oscillator) tt_um_jv_sigdel (1st passive Sigma Delta ADC) tt_um_wokwi_392873974467527681 (PILIPINAS) tt_um_scorbetta_goa (GOA - grogu on ASIC) tt_um_sanojn_ttrpg_dice (TTRPG Dice + simple I2C peripheral) tt_um_urish_dffram (DFFRAM Example (128 bytes)) tt_um_lucaz97_monobit (Monobit Test) tt_um_noritsuna_i4004 (i4004 for TinyTapeout) tt_um_hpretl_tt06_tdc_v2 (Synthesized Time-to-Digital Converter (TDC) v2) tt_um_vaf_555_timer (A 555-Timer Clone for Tiny Tapeout 6) tt_um_obriensp_be8 (8-bit CPU with Debugger (Lite)) tt_um_toivoh_retro_console (Retro Console) tt_um_mattvenn_inverter (Double Inverter) tt_um_SteffenReith_ASGTop (ASG) tt_um_lucaz97_rng_tests (rng Test) tt_um_dieroller_nathangross1 (Die Roller) tt_um_kwilke_cdc_fifo (Clock Domain Crossing FIFO) tt_um_spiff42_exp_led_pwm (LED PWM controller) tt_um_devinatkin_fastreadout (Fast Readout Image Sensor Prototype) tt_um_ja1tye_tiny_cpu (Tiny 8-bit CPU) tt_um_7seg_animated (Animated 7-segment character display) tt_um_neurocore (Neurocore) tt_um_zhwa_rgb_mixer (RGB Mixer) tt_um_wokwi_394704587372210177 (Cambio de giro de motor CD) tt_um_ian_keypad_controller (Keypad controller) tt_um_urish_spell (SPELL) tt_um_vks_pll (PLL blocks) tt_um_fountaincoder_top (multimac) tt_um_dsatizabal_opamp (Simple FET OpAmp with Sky130.) tt_um_obriensp_be8_nomacro (8-bit CPU with Debugger) tt_um_LFSR_shivam (10-bit Linear feedback shift register) tt_um_shivam (Pulse Width Modulation) tt_um_algofoogle_tt06_grab_bag (TT06 Grab Bag) tt_um_meiniKi_tt06_fazyrv_exotiny (FazyRV-ExoTiny) tt_um_wokwi_394888799427677185 (4-bit stochastic multiplier traditional) tt_um_QIF_8bit (8 Bit Digital QIF) tt_um_MATTHIAS_M_PAL_TOP_WRAPPER (easy PAL) tt_um_andrewtron3000 (Rule 30 Engine!) tt_um_tommythorn_4b_cpu_v2 (Silly 4b CPU v2) tt_um_aerox2_jrb8_computer (The James Retro Byte 8 computer) tt_um_wokwi_394898807123828737 (4-bit Stochastic Multiplier Compact with Stochastic Resonator) tt_um_argunda_tiny_opamp (Tiny Opamp) tt_um_fdc_chip (Frequency to digital converters (asynchronous and synchronous)) tt_um_8bit_cpu (8-Bit CPU In a Week) tt_um_mitssdd (co processor for precision farming) tt_um_fstolzcode (Tiny Zuse) tt_um_liu3hao_rv32e_min_mcu (tt06-RV32E_MinMCU) tt_um_kianV_rv32ima_uLinux_SoC (KianV uLinux SoC) tt_um_wokwi_395444977868278785 (*NOT WORKING* HP 5082-7500 Decoder) tt_um_wokwi_394618582085551105 (Keypad Decoder) tt_um_wokwi_395054820631340033 (Workshop Hackaday Juli) tt_um_wokwi_395055035944909825 (Some_LEDs) tt_um_wokwi_395055351144787969 (Hack a day Tiny Tapeout project) tt_um_wokwi_395054823569451009 (First TT Project) tt_um_wokwi_395054823837887489 (Dice) tt_um_wokwi_395055341723330561 (Workshop_chip) tt_um_jduchniewicz_prng (8-bit PRNG) tt_um_wokwi_395054564978002945 (Bestagon LED matrix driver) tt_um_wokwi_395054466384583681 (1-Bit ALU 2) tt_um_wokwi_395058308283408385 (test for tiny tapeout hackaday) tt_um_s1pu11i_simple_nco (Simple NCO) tt_um_wokwi_395055359324730369 (Tiny_Tapeout_6_Frank) tt_um_disp1 (Display test 1) tt_um_pckys_game (PCKY´s Successive Approximation Game) tt_um_tiny_shader_mole99 (Tiny Shader) tt_um_wokwi_393815624518031361 (My Chip) tt_um_minibyte (Minibyte CPU) tt_um_emilian_rf_playground (IDAC8 based on divide current by 2) tt_um_triple_watchdog (Triple Watchdog) tt_um_wokwi_395142547244224513 (EFAB Demo 2) tt_um_chisel_hello_schoeberl (Chisel Hello World) tt_um_aiju_8080 (8080 CPU) tt_um_wokwi_395134712676183041 (Inverters) tt_um_nubcore_default_tape (DEFAULT) tt_um_wuehr1999_servotester (Servotester) tt_um_wokwi_395055722430895105 (Servo Signal Tester) tt_um_exai_izhikevich_neuron (Izhikevich Neuron) tt_um_lisa (LISA 8-Bit Microcontroller) tt_um_wokwi_394707429798790145 (32-Bit Galois Linear Feedback Shift Register) tt_um_CKPope_top (X/Y Controller) tt_um_MNSLab_BLDC (Universal Motor and Actuator Controller) tt_um_couchand_dual_deque (Dual Deque) tt_um_JamesTimothyMeech_inverter (Programmable Thing) tt_um_signed_unsigned_4x4_bit_multiplier (Signed Unsigned multiplyer) tt_um_lipsi_schoeberl (Lipsi: Probably the Smallest Processor in the World) tt_um_i_tree_batzolislefteris (Anomaly Detection using Isolation trees) tt_um_wokwi_394830069681034241 (Cyclic Redundancy Check 8 bit) tt_um_rng_3_lucaz97 (RNG3) tt_um_wokwi_395263962779770881 (Bivium-B Non-Linear Feedback Shift Register) tt_um_dvxf_dj8 (DJ8 8-bit CPU) tt_um_silicon_tinytapeout_lm07 (Digital Temperature Monitor) tt_um_htfab_flash_adc (Flash ADC) tt_um_chisel_pong (Chisel Pong) tt_um_wokwi_395414987024660481 (HELP for tinyTapeout) tt_um_jorgenkraghjakobsen_toi2s (SPDIF to I2S decoder) tt_um_cmerrill_pdm (Parallel / SPI modulation tester) tt_um_csit_luks (CSIT-Luks) tt_um_wokwi_395357890431011841 (Trivium Non-Linear Feedback Shift Register) tt_um_drburke3_top (SADdiff_v1) tt_um_cejmu_riscv (TinyRV1 CPU) tt_um_rejunity_current_cmp (Analog Current Comparator) tt_um_loco_choco (BF Processor) tt_um_qubitbytes_alive (It's Alive) tt_um_wokwi_395061443288867841 (BCD to single 7 segment display Converter) tt_um_SJ (SiliconJackets_Systolic_Array) tt_um_ejfogleman_smsdac (8-bit DEM R2R DAC) tt_um_wokwi_395055455727667201 (Hardware Trojan Part II) tt_um_ericsmi_weste_problem_4_11 (Measurement of CMOS VLSI Design Problem 4.11) tt_um_wokwi_395034561853515777 (2 bit Binary Calculator) tt_um_mw73_pmic (Power Management IC) tt_um_Counter_1_shivam (8-bit Binary Counter) tt_um_wokwi_395054508867644417 (SynchMux) tt_um_otp_encryptor (TT06 OTP Encryptor) tt_um_wokwi_395514572866576385 (Parity Generator) tt_um_ADPCM_COMPRESSOR (ADPCM Encoder Audio Compressor) tt_um_3515_sequenceDetector (Sequence detector using 7-segment) tt_um_faramire_stopwatch (Simple Stopwatch) tt_um_ks_pyamnihc (Karplus-Strong String Synthesis) tt_um_dlmiles_muldiv8 (MULDIV unit (8-bit signed/unsigned)) tt_um_dlmiles_muldiv8_sky130faha (MULDIV unit (8-bit signed/unsigned) with sky130 HA/FA cells) tt_um_tommythorn_ncl_lfsr (NCL LFSR) tt_um_lk_ans_top (ANS Encoder/Decoder) tt_um_MichaelBell_latch_mem (Latch RAM (64 bytes)) tt_um_wokwi_395179352683141121 (Combination Lock) tt_um_Uart_Transciver (UART Transceiver) tt_um_dgkaminski (4-Bit ALU) tt_um_DigitalClockTop (TDM Digital Clock) tt_um_wokwi_394640918790880257 (IFSC Keypad Locker) tt_um_wokwi_395355133883896833 (BIT COMPARATOR) tt_um_alu (SumLatchUART_System) tt_um_alfiero88_VCII (VCII) tt_um_ALU (3-bit ALU) tt_um_topTDC (Convertidor de Tiempo a Digital (TDC)) tt_um_UABCReloj (24 H Clock) tt_um_CDMA_Santiago (CDMA_2024) tt_um_dr_skyler_clock (Clock) tt_um_motor (motor a pasos) tt_um_mult_2b (mult_2b) tt_um_CodHex7seg (Decodificador binario a display 7 segmentos hexadecimal) tt_um_S2P (Serial to Parallel Register) tt_um_PWM (PWM) tt_um_ss_register (serie_serie_register) tt_um_stepper (Stepper) tt_um_g3f (Generador digital trifásico) tt_um_ALU_DECODERS (ALU with a Gray and Octal decoders) tt_um_ram (4 bit RAM) tt_um_sap_1 (SAP-1 Computer) tt_um_guitar_pedal (Integrated Distorion Pedal) tt_um_mbalestrini_usb_cdc_devices (Two ports USB CDC device) tt_um_adammaj (Tiny ALU) tt_um_wokwi_395567106413190145 (4-Bit Full Adder and Subtractor with Hardware Trojan) tt_um_gak25_8bit_cpu_ext (Most minimal extension of friend's 'CPU In a Week' in a day) tt_um_hsc_tdc (UCSC HW Systems Collective, TDC) tt_um_BoothMulti_hhrb98 (UACJ-MIE-Booth 4) tt_um_dlmiles_poc_fskmodem_hdlctrx (FSK Modem +HDLC +UART (PoC)) tt_um_simplez_rcoeurjoly (tt6-simplez) tt_um_nurirfansyah_alits01 (Analog Test Circuit ITS: VCO) tt_um_ppca (drEEm tEEm PPCA) tt_um_wokwi_395522292785089537 (Displays CIt) tt_um_fpu (Dgrid_FPU) tt_um_duk_lif (Leaky Integrate and fire neuron(LIF)) tt_um_bomba1 (Latin_bomba) tt_um_chatgpt_rsnn_paolaunisa (ChatGPT designed Recurrent Spiking Neural Network) tt_um_bit_ctrl (Bit Control) tt_um_array_multiplier_hhrb98 (Array Multiplier) tt_um_wallace_hhrb98 (UACJ-Wallace multiplier) tt_um_I2C_to_SPI (TinyTapeout SPI Master) tt_um_rng (Random number generator) tt_um_wokwi_395599496098067457 (EVEN AND ODD COUNTERS) tt_um_8bitALU (8bit ALU) tt_um_aleena (Analog Sigmoid) tt_um_rejunity_1_58bit (Ternary 1.58-bit x 8-bit matrix multiplier) tt_um_rejunity_fp4_mul_i8 (FP4 x 8-bit matrix multiplier) tt_um_PWM_Controller (PWM Controller) tt_um_couchand_cora16 (CORA-16) tt_um_frq_divider (clk frequency divider controled by rom) tt_um_wokwi_390913889347409921 (Notre Dame Dorms LED) tt_um_timer_counter_UGM (4-Digit Scanning Digital Timer Counter) tt_um_koconnor_kstep (kstep) tt_um_lancemitrex (DIP Switch to HEX 7-segment Display) tt_um_PWM_Sine_UART (PWM_Sinewave_UART) tt_um_nicklausthompson_twi_monitor (TWI Monitor) tt_um_wokwi_395615790979120129 (Cambio de giro de motor CD) tt_um_ancho (Circuito PWM con ciclo de trabajo configurable) tt_um_wokwi_395618714068432897 (32b Fibonacci Original) tt_um_voting_thingey (Voting thingey) tt_um_hsc_tdc_buf (UCSC HW Systems Collective, TDC - BUF2x1) tt_um_hsc_tdc_mux (UCSC HW Systems Collective, TDC - MUX2x1) tt_um_petersn_micro1 (14 Hour Simple Computer) tt_um_sanojn_tlv2556_interface (UART interface to ADC TLV2556 (VHDL Test)) tt_um_gray_sobel (Gray scale and Sobel filter) tt_um_wokwi_395614106833794049 (Universal gates) Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available