934 SHA-256 Processor

934 : SHA-256 Processor

Design render

How it works

The design is a fully synthesizable, stream-oriented SHA‑256 implementation built from four simple modules. It consumes a byte stream that has already been padded according to FIPS 180‑4 (§5.1.1), performs the compression over 512‑bit blocks, and emits the 256‑bit digest as 32 raw bytes (MSB first).

  • sha256_core_v3: Compression engine for a single 512‑bit block.

    • 64 rounds in 64 clock cycles (one round per cycle).
    • On‑the‑fly message schedule using a 16‑word circular buffer (w[0..15]).
    • Selects IV vs. chained state via first_run.
    • Asserts ready when the block is complete; digest is available on hash_out.
  • sha256_k_rom_soft: Small combinational ROM supplying the 64 K‑constants indexed by the round counter.

  • sha256_processor: Byte‑stream front‑end for the core.

    • Buffers incoming bytes into a 512‑bit block_buffer.
    • States: IDLE → LOAD → HASH → DONE.
    • Uses start to mark the first byte of a message and data_last to mark the final (already padded) byte of the full message.
    • Chains block results by feeding the previous hash state back into the core when first_run=0.
    • Exposes in_ready to indicate when the next byte may be accepted.
  • top_gpio_sha256: Tiny Tapeout‑friendly GPIO wrapper.

    • Implements a small 2‑byte skid buffer so no bytes are lost when in_ready momentarily de‑asserts.
    • FSM: IDLE → FEED → WAIT → DUMP.
      • FEED passes bytes to sha256_processor while in_ready is high.
      • WAIT stalls until the processor asserts done for the full message.
      • DUMP streams the 32‑byte digest on dout with dvalid asserted; bytes are sent MSB‑first (hash[255:248] … hash[7:0]).
    • Exports a ready indicator so the host can throttle transmission.
  • tt_um_sha256_processor_dvirdc: Tiny Tapeout user‑module wrapper.

    • Maps the GPIO streaming interface to ui_in, uo_out, and uio_* busses.
    • Converts Tiny Tapeout’s active‑low rst_n to the active‑high rst used internally.

Module hierarchy

tt_um_sha256_processor_dvirdc
└─ top_gpio_sha256
   └─ sha256_processor
      ├─ sha256_core_v3
      └─ sha256_k_rom_soft

Tiny Tapeout GPIO streaming interface

All I/O are synchronous to clk. Reset is synchronous, active‑high inside the design (rst = ~rst_n).

  • Inputs

    • ui_in[7:0] — data byte
    • uio_in[0]VALID (assert for one clock when ui_in is valid)
    • uio_in[1]LAST (assert together with the final, already‑padded byte of the message)
  • Outputs

    • uo_out[7:0] — digest byte stream (MSB‑first)
    • uio_out[2]DVALID (digest byte valid strobe during the 32‑cycle dump)
    • uio_out[3]BUSY (high from first accepted byte until digest dump completes)
    • uio_out[4]READY (high when the design can accept the next input byte)
  • Output‑enable

    • uio_oe = 8'b0001_1100 so bits [4,3,2] are driven by the design; other uio_* bits are inputs.

Byte‑streaming protocol (host side)

  1. Apply synchronous reset (rst_n=0 for a few cycles, then rst_n=1).
  2. Wait for READY=1.
  3. For each message byte (already padded):
    • Drive the byte on ui_in[7:0].
    • Pulse VALID for one clock. Assert LAST only with the final padded byte.
    • If READY=0, pause and retry when READY returns high (the 2‑byte skid buffer absorbs short stalls).
  4. After the final byte, the engine processes the data. When done, it emits 32 bytes on uo_out[7:0] with DVALID=1 each cycle. Collect all 32 bytes to form the digest.

Notes:

  • Input data must be padded by the host (append 0x80, zeros to 56 bytes mod 64, then 64‑bit big‑endian bit length).
  • Digest byte order is big‑endian: hash[255:248] first … hash[7:0] last.

Throughput and latency

  • Core latency per 512‑bit block: 64 cycles.
  • Digest dump: 32 cycles.
  • End‑to‑end for a 1‑block message at 50 MHz: ≈ 64 (hash) + 32 (dump) ≈ 1.92 µs plus a few control cycles. Multi‑block messages add 64 cycles per additional block.

How to test

RTL simulation (cocotb)

From the test/ directory:

cd test
make -B    # runs cocotb against the RTL sources

Key testbench: test/test_gpio_sha256.py.

  • It uses sha256_pad() to pad messages on the host, then streams the padded bytes via the GPIO protocol above.
  • It collects 32 digest bytes (raw, not ASCII) and compares against Python’s hashlib.sha256(message).digest().

Minimal host‑side example of padding and streaming logic (conceptual):

def sha256_pad(msg: bytes) -> bytes:
    padded = msg + b"\x80"
    padded += b"\x00" * ((56 - (len(msg) + 1) % 64) % 64)
    padded += (len(msg) * 8).to_bytes(8, "big")
    return padded

# For each byte in sha256_pad(message):
#   wait until READY == 1
#   drive ui_in[7:0] = byte
#   pulse VALID for one clk (and LAST with the final byte only)
# Read 32 bytes when DVALID == 1 to obtain the digest (MSB-first)

Tiny Tapeout silicon / FPGA

Drive the GPIO streaming signals via your harness or a small microcontroller/FPGA test jig at 3.3 V logic levels. Follow the protocol above. No UART is required for the default build.

Legacy UART tops (src/old_modules/top_uart_sha256*.v) are provided for reference but are not used by the Tiny Tapeout wrapper in this project. For FPGA I used Tang Nano 9k and the top module is available on src/old_modules/top_wrapper_tang9k.v


External hardware

No special peripherals are required. The design runs from the Tiny Tapeout 50 MHz clock and uses standard GPIO‑level handshakes. If desired, LEDs can be connected to observe BUSY/DVALID activity.

IO

#InputOutputBidirectional
0DATA_IN[0]DATA_OUT[0]VALID_IN
1DATA_IN[1]DATA_OUT[1]LAST_IN
2DATA_IN[2]DATA_OUT[2]DVALID_OUT
3DATA_IN[3]DATA_OUT[3]BUSY_OUT
4DATA_IN[4]DATA_OUT[4]READY_OUT
5DATA_IN[5]DATA_OUT[5]
6DATA_IN[6]DATA_OUT[6]
7DATA_IN[7]DATA_OUT[7]

Chip location

Controller Mux Mux Mux Mux Mux Mux Mux Mux Mux Analog Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Analog Mux Mux Mux Mux Analog Mux Mux Mux Mux Mux Mux tt_um_chip_rom (Chip ROM) tt_um_factory_test (Tiny Tapeout Factory Test) tt_um_oscillating_bones (Oscillating Bones) tt_um_rebelmike_incrementer (Incrementer) tt_um_rebeccargb_tt09ball_gdsart (TT09Ball GDS Art) tt_um_tt_tinyQV (TinyQV 'Asteroids' - Crowdsourced Risc-V SoC) tt_um_DalinEM_asic_1 (ASIC) tt_um_urish_simon (Simon Says memory game) tt_um_rburt16_bias_generator (Bias Generator) tt_um_librelane3_test (Tiny Tapeout LibreLane 3 Test) tt_um_10_vga_crossyroad (Crossyroad) tt_um_rebeccargb_universal_decoder (Universal Binary to Segment Decoder) tt_um_rebeccargb_hardware_utf8 (Hardware UTF Encoder/Decoder) tt_um_rebeccargb_intercal_alu (INTERCAL ALU) tt_um_rebeccargb_dipped (Densely Packed Decimal) tt_um_rebeccargb_styler (Styler) tt_um_rebeccargb_vga_timing_experiments (VGA Timing Experiments) tt_um_rebeccargb_colorbars (Color Bars) tt_um_rebeccargb_vga_pride (VGA Pride) tt_um_cw_vref (Current-Mode Bandgap Reference) tt_um_tinytapeout_logo_screensaver (VGA Screensaver with Tiny Tapeout Logo) tt_um_rburt16_opamp_3stage (OpAmp 3stage) tt_um_gamepad_pmod_demo (Gamepad Pmod Demo) tt_um_micro_tiles_container (Micro tile container) tt_um_virantha_enigma (Enigma - 52-bit Key Length) tt_um_jamesrosssharp_1bitam (1bit_am_sdr) tt_um_jamesrosssharp_tiny1bitam (Tiny 1-bit AM Radio) tt_um_MichaelBell_rle_vga (RLE Video Player) tt_um_MichaelBell_mandelbrot (VGA Mandelbrot) tt_um_murmann_group (Decimation Filter for Incremental and Regular Delta-Sigma Modulators) tt_um_betz_morse_keyer (Morse Code Keyer) tt_um_urish_giant_ringosc (Giant Ring Oscillator (3853 inverters)) tt_um_tiny_pll (Tiny PLL) tt_um_tc503_countdown_timer (Countdown Timer) tt_um_richardgonzalez_ped_traff_light (Pedestrian Traffic Light) tt_um_analog_factory_test (TT08 Analog Factory Test) tt_um_alexandercoabad_mixedsignal (mixedsignal) tt_um_tgrillz_sixSidedDie (Six Sided Die) tt_um_mattvenn_analog_ring_osc (Ring Oscillators) tt_um_vga_clock (VGA clock) tt_um_mattvenn_r2r_dac_3v3 (Analog 8 bit 3.3v R2R DAC) tt_um_mattvenn_spi_test (SPI test) tt_um_quarren42_demoscene_top (asic design is my passion) tt_um_micro_tiles_container_group2 (Micro tile container (group 2)) tt_um_z2a_rgb_mixer (RGB Mixer demo) tt_um_frequency_counter (Frequency counter) tt_um_urish_sic1 (SIC-1 8-bit SUBLEQ Single Instruction Computer) tt_um_tobi_mckellar_top (Capacitive Touch Sensor) tt_um_log_afpm (16-bit Logarithmic Approximate Floating Point Multiplier) tt_um_uwasic_dinogame (UW ASIC - Optimized Dino) tt_um_ece298a_8_bit_cpu_top (8-Bit CPU) tt_um_tqv_peripheral_harness (Rotary Encoder Peripheral) tt_um_led_matrix_driver (SPI LED Matrix Driver) tt_um_2048_vga_game (2048 sliding tile puzzle game (VGA)) tt_um_mac (MAC) tt_um_dpmunit (DPM_Unit) tt_um_nitelich_riscyjr (RISCY Jr.) tt_um_nitelich_conway (Conway's GoL) tt_um_pwen (Pulse Width Encoder) tt_um_mcs4_cpu (MCS-4 4004 CPU) tt_um_mbist (Design of SRAM BIST) tt_um_weighted_majority (Weighted Majority Voter / Trend Detector) tt_um_brandonramos_VGA_Pong_with_NES_Controllers (VGA Pong with NES Controllers) tt_um_brandonramos_opamp_ladder (2-bit Flash ADC) tt_um_NE567Mixer28 (OTA folded cascode) tt_um_acidonitroso_programmable_threshold_voltage_sensor (Programmable threshold voltage sensor) tt_um_DAC1 (tt_um_DAC1) tt_um_trivium_stream_processor (Trivium Stream Cipher) tt_um_analog_example (Digital OTA) tt_um_sortaALUAriaMitra (Sorta 4-Bit ALU) tt_um_RoyTr16 (Connect Four VGA) tt_um_jnw_wulffern (JNW-TEMP) tt_um_serdes (Secure SERDES with Integrated FIR Filtering) tt_um_limpix31_r0 (VGA Human Reaction Meter) tt_um_torurstrom_async_lock (Asynchronous Locking Unit) tt_um_galaguna_PostSys (Post's Machine CPU Based) tt_um_edwintorok (Rounding error) tt_um_td4 (tt-td04) tt_um_snn (Reward implemented Spiking Neural Network) tt_um_matrag_chirp_top (Tiny Tapeout Chirp Modulator) tt_um_sha256_processor_dvirdc (SHA-256 Processor) tt_um_pchri03_levenshtein (Fuzzy Search Engine) tt_um_AriaMitraClock (12 Hour Clock (with AM and PM)) tt_um_swangust (posit8_add) tt_um_DelosReyesJordan_HDL (Reaction Time Test) tt_um_upalermo_simple_analog_circuit (Simple Analog Circuit) tt_um_swangust2 (posit8_mul) tt_um_thexeno_rgbw_controller (RGBW Color Processor) tt_um_top_layer (Spike Detection and Classification System) tt_um_Alida_DutyCycleMeter (Duty Cycle Meter) tt_um_dco (Digitally Controlled Oscillator) tt_um_8bitalu (8-bit Pipelined ALU) tt_um_resfuzzy (resfuzzy) tt_um_javibajocero_top (MarcoPolo) tt_um_Scimia_oscillator_tester (Oscillator tester) tt_um_ag_priority_encoder_parity_checker (Priority Encoder with Parity Checker) tt_um_tnt_mosbius (tnt's variant of SKY130 mini-MOSbius) tt_um_program_counter_top_level (Test Design 1) tt_um_subdiduntil2_mixed_signal_classifier (Mixed-signal Classifier) tt_um_dac_test3v3 (Analog 8 bit 3.3v R2R DAC) tt_um_LPCAS_TP1 ( LPCAS_TP1 ) tt_um_regfield (Register Field) tt_um_delaychain (Delay Chain) tt_um_tdctest_container (Micro tile container) tt_um_spacewar (Spacewar) tt_um_Enhanced_pll (Enhance PLL) tt_um_romless_cordic_engine (ROM-less Cordic Engine) tt_um_ev_motor_control (PLC Based Electric Vehicle Motor Control System) tt_um_plc_prg (PLC-PRG) tt_um_kishorenetheti_tt16_mips (8-bit MIPS Single Cycle Processor) tt_um_snn_core (Adaptive Leaky Integrate-and-Fire spiking neuron core for edge AI) tt_um_myprocessor (8-bit Custom Processor) tt_um_sjsu (SJSU vga demo) tt_um_vedic_4x4 (Vedic 4x4 Multiplier) tt_um_braun_mult (8x8 Braun Array Multiplier) tt_um_r2r_dac (4-bit R2R DAC) tt_um_stochastic_integrator_tt9_CL123abc (Stochastic Integrator) tt_um_uart (UART Controller with FIFO and Interrupts) tt_um_lfsr_stevej (Linear Feedback Shift Register) tt_um_FFT_engine (FFT Engine) tt_um_tpu (Tiny Tapeout Tensor Processing Unit) tt_um_tt_tinyQVb (TinyQV 'Berzerk' - Crowdsourced Risc-V SoC) tt_um_IZ_RG_22 (IZ_RG_22) tt_um_32_bit_fp_ALU_S_M (32-bit floating point ALU) tt_um_AriaMitraGames (Games (Tic Tac Toe and Rock Paper Scissors)) tt_um_sc_bipolar_qif_neuron (Stochastic Computing based QIF model neuron) tt_um_mac_spst_tiny (Low Power and Enhanced Speed Multiplier, Accumulator with SPST Adder) tt_um_kb2ghz_xalu (4-bit minicomputer ALU) tt_um_emmersonv_tiq_adc (3 Bit TIQ ADC) tt_um_simonsays (Simon Says) tt_um_BNN (8-bit Binary Neural Network) tt_um_anweiteck_2stageCMOSOpAmp (2 Stage CMOS Op Amp) tt_um_6502 (Simplified 6502 Processor) tt_um_swangust3 (posit8_div) tt_um_jonathan_thing_vga (VGA-Video-Player) tt_um_wokwi_412635532198550529 (ttsky-pettit-wokproc-trainer) tt_um_vga_hello_world (VGA HELLO WORLD) tt_um_jyblue1001_pll (Analog PLL) tt_um_BryanKuang_mac_peripheral (8-bit Multiply-Accumulate (MAC) with 2-Cycle Serial Interface) tt_um_rebeccargb_tt09ball_screensaver (TT09Ball VGA Screensaver) tt_um_openfpga22 (Open FGPA 2x2 design) tt_um_andyshor_demux (Demux) tt_um_flash_raid_controller (SPI flash raid controller) tt_um_jonnor_pdm_microphone (PDM microphone) tt_um_digital_playground (Sky130 Digital Playground) tt_um_mod6_counter (Mod-6 Counter) tt_um_BMSCE_T2 (Choreo8) tt_um_Richard28277 (4-bit ALU) tt_um_shuangyu_top (Calculator) tt_um_wokwi_441382314812372993 (Sumador/restador de 4 bits) tt_um_TensorFlowE (TensorFlowE) tt_um_wokwi_441378095886546945 (7SDSC) tt_um_wokwi_440004235377529857 (Latched 4-bits adder) tt_um_dlmiles_tqvph_i2c (TinyQV I2C Controller Device) tt_um_markgarnold_pdp8 (Serial PDP8) tt_um_wokwi_441564414591667201 (tt-parity-detector) tt_um_vga_glyph_mode (VGA Glyph Mode) tt_um_toivoh_pwl_synth (PiecewiseOrionSynth Deluxe) tt_um_minirisc (MiniRISC-FSM) tt_um_wokwi_438920793944579073 (Multiple digital design structures) tt_um_sleepwell (Sleep Well) tt_um_lcd_controller_Andres078 (LCD_controller) tt_um_SummerTT_HDL (SJSU Summer Project: Game of Life) tt_um_chrishtet_LIF (Leaky Integrate and Fire Neuron) tt_um_diff (ttsky25_EpitaXC) tt_um_htfab_split_flops (Split Flops) tt_um_alu_4bit_wrapper (4-bit ALU with Flags) tt_um_tnt_rf_test (TTSKY25A Register File Test) tt_um_mosbius (mini mosbius) tt_um_robot_controller_top_module (AR Chip) tt_um_flummer_ltc (Linear Timecode (LTC) generator) tt_um_stress_sensor (Tiny_Tapeout_2025_three_sensors) tt_um_krisjdev_manchester_baby (Manchester Baby) tt_um_mbikovitsky_audio_player (Simple audio player) tt_um_wokwi_414123795172381697 (TinySnake) tt_um_vga_example (Jabulani Ball VGA Demo ) tt_um_stochastic_addmultiply_CL123abc (Stochastic Multiplier, Adder and Self-Multiplier) tt_um_nvious_graphics (nVious Graphics) tt_um_pe_simonbju (pe) tt_um_mikael (TinyTestOut) tt_um_brent_kung (brent-kung_4) tt_um_7FM_ShadyPong (ShadyPong) tt_um_algofoogle_vga_matrix_dac (Analog VGA CSDAC experiments) tt_um_tv_b_gone (TV-B-Gone) tt_um_sjsu_vga_music (SJSU Fight Song) tt_um_fsm_haz (FSM based RISC-V Pipeline Hazard Resolver) tt_um_dma (DMA controller) tt_um_3v_inverter_SiliconeGuide (Analog Double Inverter) tt_um_rejunity_lgn_mnist (LGN hand-written digit classifier (MNIST, 16x16 pixels)) tt_um_gray_sobel (Gray scale and Sobel filter for Edge Detection) tt_um_Xgamer1999_LIF (Demonstration of Leaky integrate and Fire neuron SJSU) tt_um_dac12 (12 bit DAC) tt_um_voting_machine (Digital Voting Machine) tt_um_updown_counter (8bit_up-down_counter) tt_um_openram_top (Single Port OpenRAM Testchip) tt_um_customalu (Custom ALU) tt_um_assaify_mssf_pll (24 MHz MSSF PLL) tt_um_Maj_opamp (2-Stage OpAmp Design) tt_um_wokwi_442131619043064833 (Encoder 7 segments display) tt_um_wokwi_441835796137492481 (TESVG Binary Counter and shif register ) tt_um_combo_haz (Combinational Logic Based RISC-V Pipeline Hazard Resolver) tt_um_tx_fsm (Design and Functional Verification of Error-Correcting FIFO Buffer with SECDED and ARQ ) tt_um_will_keen_solitaire (solitaire) tt_um_rom_vga_screensaver (VGA Screensaver with embedded bitmap ROM) tt_um_13hihi31_tdc (Time to Digital Converter) tt_um_dteal_awg (Arbitrary Waveform Generator) tt_um_LIF_neuron (AFM_LIF) tt_um_rebelmike_register (Circulating register test) tt_um_MichaelBell_hs_mul (8b10b decoder and multiplier) tt_um_SNPU (random_latch) tt_um_rejunity_atari2600 (Atari 2600) tt_um_bit_serial_cpu_top (16-bit bit-serial CPU) tt_um_semaforo (semaforo) tt_um_bleeptrack_cc1 (Cross stitch Creatures #1) tt_um_bleeptrack_cc2 (Cross stitch Creatures #2) tt_um_bleeptrack_cc3 (Cross stitch Creatures #3) tt_um_bleeptrack_cc4 (Cross stitch Creatures #4) tt_um_bitty (Bitty) tt_um_spi2ws2811x16 (spi2ws2811x8) tt_um_uart_spi (UART and SPI Communication blocks with loopback) tt_um_urish_charge_pump (Dickson Charge Pump) tt_um_adc_dac_tern_alu (adc_dac_BCT_addr_ALU_STI) tt_um_sky1 (GD Sky Processor) tt_um_fifo (ASYNCHRONOUS FIFO) tt_um_TT16 (Asynchronous FIFO) tt_um_axi4lite_top (Axi4_Lite) tt_um_TT06_pwm (PWM Generator) tt_um_hack_cpu (HACK CPU) tt_um_marxkar_jtag (JTAG CONTROLLER) tt_um_cache_controller (Simple Cache Controller) tt_um_stopwatchtop (Stopwatch with 7-seg Display) tt_um_adpll (all-digital pll) tt_um_tnt_rom_test (TT09 SKY130 ROM Test) tt_um_tnt_rom_nolvt_test (TT09 SKY130 ROM Test (no LVT variant)) tt_um_wokwi_414120207283716097 (fulladder) tt_um_kianV_rv32ima_uLinux_SoC (KianV uLinux SoC) tt_um_tv_b_gone_rom (TV-B-Gone-EU) Available Available Available Available Available Available Available Available Available Available Available Available Available