425 8-bit Multiply-Accumulate (MAC) with 2-Cycle Serial Interface

425 : 8-bit Multiply-Accumulate (MAC) with 2-Cycle Serial Interface

Design render
  • Author: Bryan Kuang
  • Description: An 8×8→16-bit multiply-accumulate unit with 2-cycle 8-bit serial interface, supporting signed/unsigned operations with overflow detection and clear functionality
  • GitHub repository
  • Open in 3D viewer
  • Clock: 50000000 Hz

How it works

This project is an 8×8→16‑bit Multiply–Accumulate (MAC) peripheral designed for the TinyTapeout platform. It is ideal for DSP applications or any design requiring efficient, repeated multiplication and addition.

To fit within the limited I/O of TinyTapeout, the MAC core uses a 2‑cycle 8‑bit serial interface. This allows two 8-bit operands to be sent to the core and a full 16‑bit result to be read back, all through a standard 8-bit data bus. The module also supports configurable signed/unsigned arithmetic and provides overflow detection.

Key Features

  • Compact MAC Core: Provides a full 8x8 MAC unit with a 17-bit accumulator and overflow detection.
  • 2-Cycle Serial Interface: A simple 2‑cycle input/output protocol allows full 16-bit operations using only 8-bit data ports, making it easy to integrate with microcontrollers or other hosts.
  • Signed/Unsigned Support: A dedicated control pin (signed_mode) allows switching between signed and unsigned arithmetic.
  • High-Speed Operation: Maintains full 50 MHz operation with a deterministic 4‑cycle pipeline latency.

Architecture

The peripheral's architecture is a 4-stage pipeline designed for stable, high-speed operation:

  1. Input Stage & Serial Interface: Captures and assembles the two 8‑bit operands and control signals over two clock cycles. A change detector triggers the pipeline only when new, stable data is available.
  2. Pipeline Register Stage: Registers the inputs for timing closure and passes the signed_mode setting to the multiplier and accumulator.
  3. Multiplier Stage: A configurable 8×8 block that produces a 16‑bit product, supporting both signed and unsigned modes.
  4. Accumulator Stage: A 17‑bit adder with clear_and_mult control. It accumulates results and sets an overflow flag if the result exceeds the 16-bit range.

Block Diagram

Pinout

Pin Direction Function
ui_in[7:0] Input 8-bit Data Bus. Used for Data A (cycle 1) and Data B (cycle 2).
uio_in[0] Input clear_and_mult (0 = accumulate, 1 = clear before multiplying)
uio_in[1] Input enable (Must be high during the 2-cycle input phase)
uio_in[2] Input signed_mode (0 = unsigned, 1 = signed)
uo_out[7:0] Output 8-bit Data Bus. Cycles between the high and low bytes of the 16-bit result.
uio_out[0] Output overflow flag (High when an arithmetic overflow occurs)
uio_out[1] Output data_ready flag (High when a new result is available)

How to Use the MAC

Operating the MAC involves sending two 8-bit operands and control signals over two clock cycles, waiting for the pipeline to process, and then reading the 16-bit result over two subsequent cycles.

Data Transmission Protocol

Input Protocol (2 cycles):

  • Cycle 1:
    • Place 8-bit Data A on ui_in[7:0].
    • Set uio_in[1] to 1 to enable the interface.
    • Set uio_in[0] (clear_and_mult) and uio_in[2] (signed_mode) as needed. These values are only captured on the first cycle.
  • Cycle 2:
    • Place 8-bit Data B on ui_in[7:0].
    • Keep uio_in[1] (enable) at 1.
  • After Cycle 2, set uio_in[1] (enable) to 0 to complete the input operation.

Output Protocol (2 cycles):

  • After an operation is complete (approx. 4-6 cycles), the data_ready flag (uio_out[1]) will go high.
  • The 16-bit result is available on uo_out[7:0] and cycles between the high and low bytes on every clock edge.
  • Read Cycle 1: Capture the High Byte (bits 15:8) of the result.
  • Read Cycle 2: Capture the Low Byte (bits 7:0) of the result.
  • The overflow flag is available on uio_out[0].

Usage Examples

Example 1: Basic Multiplication (5 * 6)
// 1. Send data to calculate 5 * 6 = 30
// Cycle 1: Send Data A (5) and control signals (clear=1, signed=0)
ui_in <= 8'h05;
uio_in <= 3'b011; // {signed_mode, enable, clear_and_mult}

// Cycle 2: Send Data B (6)
ui_in <= 8'h06;
uio_in <= 3'b010; // {signed_mode, enable, clear_and_mult} - only enable matters

// 2. Wait ~4-6 cycles for the pipeline.

// 3. Read the result (30 = 0x001E)
// Read Cycle 1: uo_out will be 0x00 (High Byte)
// Read Cycle 2: uo_out will be 0x1E (Low Byte)
Example 2: Accumulation (100 + 25)
// 1. First, calculate 10 * 10 = 100 with clear_and_mult = 1
// ... send 10 and 10 ...

// 2. Wait for the operation to complete.

// 3. Next, calculate 5 * 5 = 25 with clear_and_mult = 0 to accumulate
// Cycle 1: Send Data A (5) and control signals (clear=0, signed=0)
ui_in <= 8'h05;
uio_in <= 3'b010; // {signed_mode, enable, clear_and_mult}

// Cycle 2: Send Data B (5)
ui_in <= 8'h05;
uio_in <= 3'b010;

// 4. Wait for the pipeline.

// 5. Read the result (125 = 0x007D)
// Read Cycle 1: uo_out will be 0x00 (High Byte)
// Read Cycle 2: uo_out will be 0x7D (Low Byte)
Example 3: Signed Multiplication (10 * -5)
// 1. Send data for 10 * -5 = -50, with signed_mode = 1
//    -5 in 8-bit two's complement is 0xFB (251)
// Cycle 1: Send Data A (10) and control signals (clear=1, signed=1)
ui_in <= 8'h0A;
uio_in <= 3'b111; // {signed_mode, enable, clear_and_mult}

// Cycle 2: Send Data B (251)
ui_in <= 8'hFB;
uio_in <= 3'b110;

// 2. Wait for the pipeline.

// 3. Read the result (-50 = 0xFFCE in 16-bit two's complement)
// Read Cycle 1: uo_out will be 0xFF (High Byte)
// Read Cycle 2: uo_out will be 0xCE (Low Byte)

External hardware

No external hardware is required. This is a purely digital design that operates with:

  • Clock: A 50MHz system clock from the TinyTapeout board.
  • Reset: An active-low reset signal.
  • Digital I/O: The standard TinyTapeout pin interface for sending operands and control signals.

IO

#InputOutputBidirectional
0Data[7:0] - 8-bit data input (Cycle 1: Data A, Cycle 2: Data B)Result[7:0] - 8-bit data output (cycles between low/high bytes)Clear_and_Mult (IN) / Overflow (OUT) - Control input or overflow flag output
1Data[7:0] - 8-bit data input (Cycle 1: Data A, Cycle 2: Data B)Result[7:0] - 8-bit data output (cycles between low/high bytes)Enable (IN) / Data_Ready (OUT) - Interface enable input or data ready output
2Data[7:0] - 8-bit data input (Cycle 1: Data A, Cycle 2: Data B)Result[7:0] - 8-bit data output (cycles between low/high bytes)Signed_Mode (IN) - Signed mode control (0=unsigned, 1=signed)
3Data[7:0] - 8-bit data input (Cycle 1: Data A, Cycle 2: Data B)Result[7:0] - 8-bit data output (cycles between low/high bytes)
4Data[7:0] - 8-bit data input (Cycle 1: Data A, Cycle 2: Data B)Result[7:0] - 8-bit data output (cycles between low/high bytes)
5Data[7:0] - 8-bit data input (Cycle 1: Data A, Cycle 2: Data B)Result[7:0] - 8-bit data output (cycles between low/high bytes)
6Data[7:0] - 8-bit data input (Cycle 1: Data A, Cycle 2: Data B)Result[7:0] - 8-bit data output (cycles between low/high bytes)
7Data[7:0] - 8-bit data input (Cycle 1: Data A, Cycle 2: Data B)Result[7:0] - 8-bit data output (cycles between low/high bytes)

Chip location

Controller Mux Mux Mux Mux Mux Mux Mux Mux Mux Analog Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Analog Mux Mux Mux Mux Analog Mux Mux Mux Mux Mux Mux tt_um_chip_rom (Chip ROM) tt_um_factory_test (Tiny Tapeout Factory Test) tt_um_oscillating_bones (Oscillating Bones) tt_um_rebelmike_incrementer (Incrementer) tt_um_rebeccargb_tt09ball_gdsart (TT09Ball GDS Art) tt_um_tt_tinyQV (TinyQV 'Asteroids' - Crowdsourced Risc-V SoC) tt_um_DalinEM_asic_1 (ASIC) tt_um_urish_simon (Simon Says memory game) tt_um_rburt16_bias_generator (Bias Generator) tt_um_librelane3_test (Tiny Tapeout LibreLane 3 Test) tt_um_10_vga_crossyroad (Crossyroad) tt_um_rebeccargb_universal_decoder (Universal Binary to Segment Decoder) tt_um_rebeccargb_hardware_utf8 (Hardware UTF Encoder/Decoder) tt_um_rebeccargb_intercal_alu (INTERCAL ALU) tt_um_rebeccargb_dipped (Densely Packed Decimal) tt_um_rebeccargb_styler (Styler) tt_um_rebeccargb_vga_timing_experiments (VGA Timing Experiments) tt_um_rebeccargb_colorbars (Color Bars) tt_um_rebeccargb_vga_pride (VGA Pride) tt_um_cw_vref (Current-Mode Bandgap Reference) tt_um_tinytapeout_logo_screensaver (VGA Screensaver with Tiny Tapeout Logo) tt_um_rburt16_opamp_3stage (OpAmp 3stage) tt_um_gamepad_pmod_demo (Gamepad Pmod Demo) tt_um_micro_tiles_container (Micro tile container) tt_um_virantha_enigma (Enigma - 52-bit Key Length) tt_um_jamesrosssharp_1bitam (1bit_am_sdr) tt_um_jamesrosssharp_tiny1bitam (Tiny 1-bit AM Radio) tt_um_MichaelBell_rle_vga (RLE Video Player) tt_um_MichaelBell_mandelbrot (VGA Mandelbrot) tt_um_murmann_group (Decimation Filter for Incremental and Regular Delta-Sigma Modulators) tt_um_betz_morse_keyer (Morse Code Keyer) tt_um_urish_giant_ringosc (Giant Ring Oscillator (3853 inverters)) tt_um_tiny_pll (Tiny PLL) tt_um_tc503_countdown_timer (Countdown Timer) tt_um_richardgonzalez_ped_traff_light (Pedestrian Traffic Light) tt_um_analog_factory_test (TT08 Analog Factory Test) tt_um_alexandercoabad_mixedsignal (mixedsignal) tt_um_tgrillz_sixSidedDie (Six Sided Die) tt_um_mattvenn_analog_ring_osc (Ring Oscillators) tt_um_vga_clock (VGA clock) tt_um_mattvenn_r2r_dac_3v3 (Analog 8 bit 3.3v R2R DAC) tt_um_mattvenn_spi_test (SPI test) tt_um_quarren42_demoscene_top (asic design is my passion) tt_um_micro_tiles_container_group2 (Micro tile container (group 2)) tt_um_z2a_rgb_mixer (RGB Mixer demo) tt_um_frequency_counter (Frequency counter) tt_um_urish_sic1 (SIC-1 8-bit SUBLEQ Single Instruction Computer) tt_um_tobi_mckellar_top (Capacitive Touch Sensor) tt_um_log_afpm (16-bit Logarithmic Approximate Floating Point Multiplier) tt_um_uwasic_dinogame (UW ASIC - Optimized Dino) tt_um_ece298a_8_bit_cpu_top (8-Bit CPU) tt_um_tqv_peripheral_harness (Rotary Encoder Peripheral) tt_um_led_matrix_driver (SPI LED Matrix Driver) tt_um_2048_vga_game (2048 sliding tile puzzle game (VGA)) tt_um_mac (MAC) tt_um_dpmunit (DPM_Unit) tt_um_nitelich_riscyjr (RISCY Jr.) tt_um_nitelich_conway (Conway's GoL) tt_um_pwen (Pulse Width Encoder) tt_um_mcs4_cpu (MCS-4 4004 CPU) tt_um_mbist (Design of SRAM BIST) tt_um_weighted_majority (Weighted Majority Voter / Trend Detector) tt_um_brandonramos_VGA_Pong_with_NES_Controllers (VGA Pong with NES Controllers) tt_um_brandonramos_opamp_ladder (2-bit Flash ADC) tt_um_NE567Mixer28 (OTA folded cascode) tt_um_acidonitroso_programmable_threshold_voltage_sensor (Programmable threshold voltage sensor) tt_um_DAC1 (tt_um_DAC1) tt_um_trivium_stream_processor (Trivium Stream Cipher) tt_um_analog_example (Digital OTA) tt_um_sortaALUAriaMitra (Sorta 4-Bit ALU) tt_um_RoyTr16 (Connect Four VGA) tt_um_jnw_wulffern (JNW-TEMP) tt_um_serdes (Secure SERDES with Integrated FIR Filtering) tt_um_limpix31_r0 (VGA Human Reaction Meter) tt_um_torurstrom_async_lock (Asynchronous Locking Unit) tt_um_galaguna_PostSys (Post's Machine CPU Based) tt_um_edwintorok (Rounding error) tt_um_td4 (tt-td04) tt_um_snn (Reward implemented Spiking Neural Network) tt_um_matrag_chirp_top (Tiny Tapeout Chirp Modulator) tt_um_sha256_processor_dvirdc (SHA-256 Processor) tt_um_pchri03_levenshtein (Fuzzy Search Engine) tt_um_AriaMitraClock (12 Hour Clock (with AM and PM)) tt_um_swangust (posit8_add) tt_um_DelosReyesJordan_HDL (Reaction Time Test) tt_um_upalermo_simple_analog_circuit (Simple Analog Circuit) tt_um_swangust2 (posit8_mul) tt_um_thexeno_rgbw_controller (RGBW Color Processor) tt_um_top_layer (Spike Detection and Classification System) tt_um_Alida_DutyCycleMeter (Duty Cycle Meter) tt_um_dco (Digitally Controlled Oscillator) tt_um_8bitalu (8-bit Pipelined ALU) tt_um_resfuzzy (resfuzzy) tt_um_javibajocero_top (MarcoPolo) tt_um_Scimia_oscillator_tester (Oscillator tester) tt_um_ag_priority_encoder_parity_checker (Priority Encoder with Parity Checker) tt_um_tnt_mosbius (tnt's variant of SKY130 mini-MOSbius) tt_um_program_counter_top_level (Test Design 1) tt_um_subdiduntil2_mixed_signal_classifier (Mixed-signal Classifier) tt_um_dac_test3v3 (Analog 8 bit 3.3v R2R DAC) tt_um_LPCAS_TP1 ( LPCAS_TP1 ) tt_um_regfield (Register Field) tt_um_delaychain (Delay Chain) tt_um_tdctest_container (Micro tile container) tt_um_spacewar (Spacewar) tt_um_Enhanced_pll (Enhance PLL) tt_um_romless_cordic_engine (ROM-less Cordic Engine) tt_um_ev_motor_control (PLC Based Electric Vehicle Motor Control System) tt_um_plc_prg (PLC-PRG) tt_um_kishorenetheti_tt16_mips (8-bit MIPS Single Cycle Processor) tt_um_snn_core (Adaptive Leaky Integrate-and-Fire spiking neuron core for edge AI) tt_um_myprocessor (8-bit Custom Processor) tt_um_sjsu (SJSU vga demo) tt_um_vedic_4x4 (Vedic 4x4 Multiplier) tt_um_braun_mult (8x8 Braun Array Multiplier) tt_um_r2r_dac (4-bit R2R DAC) tt_um_stochastic_integrator_tt9_CL123abc (Stochastic Integrator) tt_um_uart (UART Controller with FIFO and Interrupts) tt_um_lfsr_stevej (Linear Feedback Shift Register) tt_um_FFT_engine (FFT Engine) tt_um_tpu (Tiny Tapeout Tensor Processing Unit) tt_um_tt_tinyQVb (TinyQV 'Berzerk' - Crowdsourced Risc-V SoC) tt_um_IZ_RG_22 (IZ_RG_22) tt_um_32_bit_fp_ALU_S_M (32-bit floating point ALU) tt_um_AriaMitraGames (Games (Tic Tac Toe and Rock Paper Scissors)) tt_um_sc_bipolar_qif_neuron (Stochastic Computing based QIF model neuron) tt_um_mac_spst_tiny (Low Power and Enhanced Speed Multiplier, Accumulator with SPST Adder) tt_um_kb2ghz_xalu (4-bit minicomputer ALU) tt_um_emmersonv_tiq_adc (3 Bit TIQ ADC) tt_um_simonsays (Simon Says) tt_um_BNN (8-bit Binary Neural Network) tt_um_anweiteck_2stageCMOSOpAmp (2 Stage CMOS Op Amp) tt_um_6502 (Simplified 6502 Processor) tt_um_swangust3 (posit8_div) tt_um_jonathan_thing_vga (VGA-Video-Player) tt_um_wokwi_412635532198550529 (ttsky-pettit-wokproc-trainer) tt_um_vga_hello_world (VGA HELLO WORLD) tt_um_jyblue1001_pll (Analog PLL) tt_um_BryanKuang_mac_peripheral (8-bit Multiply-Accumulate (MAC) with 2-Cycle Serial Interface) tt_um_rebeccargb_tt09ball_screensaver (TT09Ball VGA Screensaver) tt_um_openfpga22 (Open FGPA 2x2 design) tt_um_andyshor_demux (Demux) tt_um_flash_raid_controller (SPI flash raid controller) tt_um_jonnor_pdm_microphone (PDM microphone) tt_um_digital_playground (Sky130 Digital Playground) tt_um_mod6_counter (Mod-6 Counter) tt_um_BMSCE_T2 (Choreo8) tt_um_Richard28277 (4-bit ALU) tt_um_shuangyu_top (Calculator) tt_um_wokwi_441382314812372993 (Sumador/restador de 4 bits) tt_um_TensorFlowE (TensorFlowE) tt_um_wokwi_441378095886546945 (7SDSC) tt_um_wokwi_440004235377529857 (Latched 4-bits adder) tt_um_dlmiles_tqvph_i2c (TinyQV I2C Controller Device) tt_um_markgarnold_pdp8 (Serial PDP8) tt_um_wokwi_441564414591667201 (tt-parity-detector) tt_um_vga_glyph_mode (VGA Glyph Mode) tt_um_toivoh_pwl_synth (PiecewiseOrionSynth Deluxe) tt_um_minirisc (MiniRISC-FSM) tt_um_wokwi_438920793944579073 (Multiple digital design structures) tt_um_sleepwell (Sleep Well) tt_um_lcd_controller_Andres078 (LCD_controller) tt_um_SummerTT_HDL (SJSU Summer Project: Game of Life) tt_um_chrishtet_LIF (Leaky Integrate and Fire Neuron) tt_um_diff (ttsky25_EpitaXC) tt_um_htfab_split_flops (Split Flops) tt_um_alu_4bit_wrapper (4-bit ALU with Flags) tt_um_tnt_rf_test (TTSKY25A Register File Test) tt_um_mosbius (mini mosbius) tt_um_robot_controller_top_module (AR Chip) tt_um_flummer_ltc (Linear Timecode (LTC) generator) tt_um_stress_sensor (Tiny_Tapeout_2025_three_sensors) tt_um_krisjdev_manchester_baby (Manchester Baby) tt_um_mbikovitsky_audio_player (Simple audio player) tt_um_wokwi_414123795172381697 (TinySnake) tt_um_vga_example (Jabulani Ball VGA Demo ) tt_um_stochastic_addmultiply_CL123abc (Stochastic Multiplier, Adder and Self-Multiplier) tt_um_nvious_graphics (nVious Graphics) tt_um_pe_simonbju (pe) tt_um_mikael (TinyTestOut) tt_um_brent_kung (brent-kung_4) tt_um_7FM_ShadyPong (ShadyPong) tt_um_algofoogle_vga_matrix_dac (Analog VGA CSDAC experiments) tt_um_tv_b_gone (TV-B-Gone) tt_um_sjsu_vga_music (SJSU Fight Song) tt_um_fsm_haz (FSM based RISC-V Pipeline Hazard Resolver) tt_um_dma (DMA controller) tt_um_3v_inverter_SiliconeGuide (Analog Double Inverter) tt_um_rejunity_lgn_mnist (LGN hand-written digit classifier (MNIST, 16x16 pixels)) tt_um_gray_sobel (Gray scale and Sobel filter for Edge Detection) tt_um_Xgamer1999_LIF (Demonstration of Leaky integrate and Fire neuron SJSU) tt_um_dac12 (12 bit DAC) tt_um_voting_machine (Digital Voting Machine) tt_um_updown_counter (8bit_up-down_counter) tt_um_openram_top (Single Port OpenRAM Testchip) tt_um_customalu (Custom ALU) tt_um_assaify_mssf_pll (24 MHz MSSF PLL) tt_um_Maj_opamp (2-Stage OpAmp Design) tt_um_wokwi_442131619043064833 (Encoder 7 segments display) tt_um_wokwi_441835796137492481 (TESVG Binary Counter and shif register ) tt_um_combo_haz (Combinational Logic Based RISC-V Pipeline Hazard Resolver) tt_um_tx_fsm (Design and Functional Verification of Error-Correcting FIFO Buffer with SECDED and ARQ ) tt_um_will_keen_solitaire (solitaire) tt_um_rom_vga_screensaver (VGA Screensaver with embedded bitmap ROM) tt_um_13hihi31_tdc (Time to Digital Converter) tt_um_dteal_awg (Arbitrary Waveform Generator) tt_um_LIF_neuron (AFM_LIF) tt_um_rebelmike_register (Circulating register test) tt_um_MichaelBell_hs_mul (8b10b decoder and multiplier) tt_um_SNPU (random_latch) tt_um_rejunity_atari2600 (Atari 2600) tt_um_bit_serial_cpu_top (16-bit bit-serial CPU) tt_um_semaforo (semaforo) tt_um_bleeptrack_cc1 (Cross stitch Creatures #1) tt_um_bleeptrack_cc2 (Cross stitch Creatures #2) tt_um_bleeptrack_cc3 (Cross stitch Creatures #3) tt_um_bleeptrack_cc4 (Cross stitch Creatures #4) tt_um_bitty (Bitty) tt_um_spi2ws2811x16 (spi2ws2811x8) tt_um_uart_spi (UART and SPI Communication blocks with loopback) tt_um_urish_charge_pump (Dickson Charge Pump) tt_um_adc_dac_tern_alu (adc_dac_BCT_addr_ALU_STI) tt_um_sky1 (GD Sky Processor) tt_um_fifo (ASYNCHRONOUS FIFO) tt_um_TT16 (Asynchronous FIFO) tt_um_axi4lite_top (Axi4_Lite) tt_um_TT06_pwm (PWM Generator) tt_um_hack_cpu (HACK CPU) tt_um_marxkar_jtag (JTAG CONTROLLER) tt_um_cache_controller (Simple Cache Controller) tt_um_stopwatchtop (Stopwatch with 7-seg Display) tt_um_adpll (all-digital pll) tt_um_tnt_rom_test (TT09 SKY130 ROM Test) tt_um_tnt_rom_nolvt_test (TT09 SKY130 ROM Test (no LVT variant)) tt_um_wokwi_414120207283716097 (fulladder) tt_um_kianV_rv32ima_uLinux_SoC (KianV uLinux SoC) tt_um_tv_b_gone_rom (TV-B-Gone-EU) Available Available Available Available Available Available Available Available Available Available Available Available Available