838 DL float MAC

838 : DL float MAC

Design render
  • Author: Ananya P & Nidhi M D
  • Description: MAC unit for 16 bit DL float data type
  • GitHub repository
  • Clock: 40000000 Hz

Design Description

image

The digital design is a 5 stage pipelined architecture implementation of MAC Operation for 16 bit DLFloat numbers. DLFloat is a 16-bit floating-point format designed for deep learning training and inference, where speed is prioritized over precision.

Details of DLFloats:

Sign bit: 1 bit

Exponent width: 6 bits

Significand precision: 9 bits

Bias exponent: 31

Value Binary format
Max normal S. 111110.111111111
Min normal S. 000001.000000000
Zero S. 000000.000000000
Infinity-Nan (combined) S. 111111.111111111

Work Flow Details:

• The two 16 bit DLFloat input operands are supplied through the ui_in and uio_in (input)pins over two clock cycles getting stored in two registers.

• In the MAC module, the first stage involves multiplying the two inputs, followed by addition of the multiplication result and the accumulated value. The accumulated value in the MAC module starts at zero upon reset.

• After the MAC operation, the 16-bit accumulated result is pushed through uo_out pins over two clock cycles. First the msb 8 bits are pushed out followed by lsb bits.

image

This arrangement helps in achieving a pipelined architecture where after 5 clock cycles from reset the output values can be pushed out in every cycle.

Here the addition and multiplication follows the IEEE754 algorithm and the MAC operation incorporates handling the special cases like inf, NaN ,subnormals, zero and a full 16 bit precision range.

The Multiplier and Adder blocks also handle overflow and underflow cases with a saturation logic where upon overflow the result is pushed to the largest number that can be represented in the DLFloat format and similarly with underflow the result is pushed to smallest number with the exception that in Multiplier the underflow is pushed to zero to not affect the accumulated results.

How to test

The DLFloat inputs are fed as binary/hexadecimal equivalent of the binary floating point format. The outputs can be read in similar manner

External hardware

An FPGA is required to drive the inputs to the device and needs to be programmed to capture and display the 16-bit result, which arrives as 8 bits over two clock cycles.

IO

#InputOutputBidirectional
0FP16 in[0]FP16 out[0]/FP16 out[8]FP16 in[8]
1FP16 in[1]FP16 out[1]/FP16 out[9]FP16 in[9]
2FP16 in[2]FP16 out[2]/FP16 out[10]FP16 in[10]
3FP16 in[3]FP16 out[3]/FP16 out[11]FP16 in[11]
4FP16 in[4]FP16 out[4]/FP16 out[12]FP16 in[12]
5FP16 in[5]FP16 out[5]/FP16 out[13]FP16 in[13]
6FP16 in[6]FP16 out[6]/FP16 out[14]FP16 in[14]
7FP16 in[7]FP16 out[7]/FP16 out[15]FP16 in[15]

Chip location

Controller Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Analog Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Analog Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux tt_um_chip_rom (Chip ROM) tt_um_factory_test (TinyTapeout 8 Factory Test) tt_um_oscillating_bones (Oscillating Bones) tt_um_urish_charge_pump (Dickson Charge Pump) tt_um_bgr_agolmanesh (Bandgap Reference) tt_um_tnt_diff_rx (TT08 Differential Receiver test) tt_um_rejunity_vga_logo (VGA Tiny Logo (1 tile)) tt_um_tommythorn_maxbw (Asynchronous Multiplier) tt_um_mattvenn_r2r_dac_3v3 (Analog 8 bit 3.3v R2R DAC) tt_um_urish_simon (Simon Says memory game) tt_um_rebeccargb_universal_decoder (Universal Binary to Segment Decoder) tt_um_mattvenn_rgb_mixer (RGB Mixer demo5) tt_um_rebeccargb_hardware_utf8 (Hardware UTF Encoder/Decoder) tt_um_find_the_damn_issue (Find The Damn Issue) tt_um_brandonramos_VGA_Pong_with_NES_Controllers (VGA Pong with NES Controllers) tt_um_kb2ghz_xalu (4-bit minicomputer ALU) tt_um_rebeccargb_intercal_alu (INTERCAL ALU) tt_um_a1k0n_demo (Demo by a1k0n) tt_um_rburt16_bias_generator (Bias Generator) tt_um_zec_square1 ("SQUARE-1": VGA/audio demo) tt_um_jmack2201 (Sprite Bouncer with Looping Background Options) tt_um_ran_DanielZhu (Dice) tt_um_gfg_development_tinymandelbrot (TinyMandelbrot) tt_um_LnL_SoC (Lab and Lectures SoC) tt_um_htfab_pi_snake (Pi Snake) tt_um_tt08_aicd_playground (AICD Playground) tt_um_toivoh_demo (Sequential Shadows [TT08 demo competition]) tt_um_quarren42_demoscene_top (asic design is my passion) tt_um_crispy_vga (Crispy VGA) tt_um_MichaelBell_canon (TT08 Pachelbel's Canon demo) tt_um_shuangyu_top (Calculator) tt_um_wokwi_407306064811090945 (DDR throughput and flop aperature test) tt_um_08_sws (Sine Wave Synthesizer) tt_um_favoritohjs_scroller (VGA Scroller) tt_um_tt08_wirecube (Wirecube) tt_um_vga_glyph_mode (Glyph Mode) tt_um_a1k0n_vgadonut (VGA donut) tt_um_roy1707018 (RO) tt_um_analog_factory_test (TT08 Analog Factory Test) tt_um_sign_addsub (CMOS design of 4-bit Signed Adder Subtractor) tt_um_tinytapeout_logo_screensaver (VGA Screensaver with Tiny Tapeout Logo) tt_um_patater_demokit (Patater Demo Kit Waggling Rainbow on a Chip) tt_um_algofoogle_tt08_vga_fun (TT08 VGA FUN!) tt_um_simon_cipher (simon_cipher) tt_um_thexeno_rgbw_controller (Color Controller) tt_um_demosiine_sda (DemoSiine) tt_um_bytex64_munch (Munch) tt_um_alexjaeger_ringoscillator (5MHz Ring Oscillator) tt_um_cfib_demo (cfib Demoscene Entry) tt_um_wokwi_407852791999030273 (Simple 8 Bit ALU) tt_um_Richard28277 (4-bit ALU) tt_um_betz_morse_keyer (Morse Code Keyer) tt_um_nvious_graphics (nVious Graphics) tt_um_tiny_pll (Tiny PLL) tt_um_ezchips_calc (8-Bit Calculator) tt_um_hack_cpu (HACK CPU) tt_um_noritsuna_Vctrl_LC_oscillator (Voltage Controlled LC-Oscillator) tt_um_ring_divider (Divided Ring Oscillator) tt_um_2048_vga_game (2048 sliding tile puzzle game (VGA)) tt_um_morningjava_r2r_from_matt (Bucket Brigade) tt_um_ephrenm_tsal (TSAL_TT) tt_um_kapilan_alarm (Alarm Clock) tt_um_stochastic_addmultiply_CL123abc (Stochastic Multiplier, Adder and Self-Multiplier) tt_um_wokwi_407760296956596225 (tt08-octal-alu) tt_um_dlfloatmac (DL float MAC) tt_um_wakki_0123_Raw_Transistors (Raw_Transistors) tt_um_faramire_rotary_ring_wrapper (Rotary Encoder WS2812B Control) tt_um_devstdin_LDO_OSC (LDO BG IREF OSC) tt_um_frequency_counter (Frequency Counter SSD1306 OLED) tt_um_rom_test (TT08 SKY130 ROM 'YOLO' Test) tt_um_i2c_peripheral_stevej (i2c peripherals: leading zero count and fnv-1a hash) tt_um_yuri_panchul_schoolriscv_cpu_with_fibonacci_program (schoolRISCV CPU with Fibonacci program) tt_um_yuri_panchul_adder_with_flow_control (Adder with Flow Control) tt_um_brailliance (Brailliance) tt_um_nyan (nyan) tt_um_MichaelBell_mandelbrot (VGA Mandelbrot) tt_um_ssp_opamp (2-stage Opamp Designs) tt_um_fountaincoder_top_ad (pulse_add) tt_um_edwintorok (Rounding error) tt_um_mac (MAC) tt_um_dpmu (DPMU) tt_um_JAC_EE_segdecode (7 Segment Decode) tt_um_wokwi_408118380088342529 (Traffic-light-sequence) tt_um_shiftreg_test (TT08 SKY130 Shift Register 'YOLO' Test) tt_um_yuri_panchul_sea_battle_vga_game (Sea Battle) tt_um_benpayne_ps2_decoder (PS2 Decoder) tt_um_meriac_play_tune (Super Mario Tune on A Piezo Speaker) tt_um_comm_ic_bhavuk (Comm_IC) tt_um_daosvik_aesinvsbox (AES Inverse S-box) tt_um_wokwi_408216451206371329 (Logic Test) tt_um_micro_tiles_container (Micro tile container) tt_um_cattuto_sr_latch (TT08 - experiments with latch-based shift registers) tt_um_rejunity_vga_test01 (VGA Drop (audio/visual demo)) tt_um_silice (Warp) tt_um_wokwi_408231820749720577 (Abacus Lock) tt_um_jayjaywong12 (mulmul) tt_um_emmyxu_obstacle_detection (Obstacle Detection) tt_um_neural_navigators (Neural Net ASIC) tt_um_a1k0n_nyancat (VGA Nyan Cat) tt_um_rebeccargb_styler (Styler) tt_um_resfuzzy (resfuzzy) tt_um_cejmu (CEJMU Beers and Adders) tt_um_16_mic_beamformer_arghunter (16 Mic Beamformer) tt_um_pdm_pitch_filter_arghunter (PDM Pitch Filter) tt_um_pdm_correlator_arghunter (PDM Correlator) tt_um_ddc_arghunter (DDC) tt_um_i2s_to_pwm_arghunter (I2S to PWM ) tt_um_georgboecherer_vco (Analog Voltage Controlled Oscillator) tt_um_supermic_arghunter (Supermic ) tt_um_dmtd_arghunter (DMTD ) tt_um_htfab_bouncy_capsule (Bouncy Capsule) tt_um_samuelm_pwm_generator (PWM generator) tt_um_mattvenn_analog_ring_osc (Ring Oscillators) tt_um_toivoh_demo_deluxe (Sequential Shadows Deluxe [TT08 demo competition]) tt_um_vga_clock (VGA clock) tt_um_z2a_rgb_mixer (RGB Mixer demo) tt_um_faramire_stopwatch (Simple Stopwatch) tt_um_micro_tiles_container_group2 (Micro tile container (group 2)) tt_um_johshoff_metaballs (Metaballs) tt_um_top (Flame demo) tt_um_NicklausThompson_SkyKing (SkyKing Demo) tt_um_Electom_cla_4bits (4-bit CLA) tt_um_vga_cbtest (Generate VGA output for Color Blindness Test) tt_um_zoom_zoom (Zoom Zoom) tt_um_dpmunit (DPM_Unit) tt_um_clock_divider_arghunter (Clock Divider ) tt_um_dlmiles_poc_fskmodem_hdlctrx (FSK Modem +HDLC +UART (PoC)) tt_um_emilian_muxpga (TinyFPGA resubmit for TT08) tt_um_pyamnihc_dummy_counter (Dummy Counter) tt_um_whynot (Why not?) tt_um_sudana_ota5t_1 (5-T OTA) tt_um_dlmiles_tt08_poc_uart (UART) tt_um_dendraws_donut (donut) tt_um_wokwi_408237988946759681 (Counter) tt_um_tmkong_rgb_mixer (RGB Mixer) Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available