718 Shaman: SHA-256 hasher

718 : Shaman: SHA-256 hasher

  • Author: Pat Deegan, psychogenic.com
  • Description: Generate a SHA256 digest for data of arbitrary length
  • GitHub repository
  • GDS submitted
  • HDL project
  • Extra docs
  • Clock: 10000000 Hz
  • External hardware: An MCU or something to feed in the bytes and receive the results

picture

How it works

This implements the SHA-256 digest to create hashes of the data you feed in. It is a naive, mostly unoptimized, implementation of the algorithm (though you can interleave data input while it’s processing, using parallel mode, if you respect busy).

Data is fed into the system in 64 byte blocks. The hash is available after each 64 byte block has been input (allowing for some cycles to finish processing).

The process is to:

  • toggle start, to reset the digest
  • put data byte on the databyte input (the “in” port)
  • wait until busy is de-asserted (if required)
  • clock the clockin_data pin

After each complete block, the digest will become available after some clocks. In short if

  • busy is not asserted; and
  • result_ready goes high

The first hash byte will be available on the out port. To get the next bytes, clock result_next and read the port.

Parallel mode allows you to start feeding in more input data while the system is still processing the previous block. You need to pay attention to and respect “busy”, here, or things will get badly munged.
Also, in parallel mode, you need to hold the clockin_data for an extra cycle when you bring it high.

Pinout looks a little weird but it is hoped this will be a nice match for the PMOD arrangement on the demo boards.

NOTES

It does NOT massage the input data into suitable blocks. Messages need to be appended with an 0x80 byte, padded such that the entire thing, along with an 8 byte suffix containing the length (big end), is a multiple of 512 bits (64 bytes). You can see this in action in the message_to_blocks() function, in test.py.

I don’t think it’s super fast but, in parallel mode, I think simulation indicates it takes on the order of 8.3 microseconds per byte using a 1MHz system clock. So, if we could feed this say a 50MHz clock, we’d get down to 166 ns/byte.
That’s only on the order of 6 megabytes per second, I dunno maybe 100x slower than my laptop, but my laptop doesn’t run on a 50MHz clock and whatevs: should do the job if it holds in realy life. All this is when processing longer messages, to swamp out the minor overhead of setup etc.

When loading input data, if using parallel mode, hold clockin_data for an extra system clock. So

  • data byte on inputs
  • clockin_data HIGH
  • hold one system clock
  • clockin_data LOW
  • … loop for next byte

How to test

Might be good to run the cocotb test to get VCDs if you really want to see it in action. But we want to play with hardware! So… There will be a python script in the repository to convert any content into the expected 512 bit blocks of bytes padded and everything to make the system happy.

With that list of bytes in hand, this should work nicely:

  1. hold n_reset low for a few clock cycles
  2. bring n_reset high, and give it a few cycles
  3. start a new message digest my clocking start (bring high for one cycle, then low)
  4. for each block in your message
    • while “busy” is HIGH, wait a bit and check again
    • for each byte in that block
      • put the byte on in port (dedicated input pins)
      • while “busy” is HIGH, wait a bit and check again
      • bring clockin_data HIGH
      • if using parallel mode, hold for an extra clock cycle
      • bring clockin_data LOW

Check and wait until “busy” is LOW and “result_ready” goes HIGH. Your first result byte will already be present on the output port. Grab it and stash it. Then, for the next 31 bytes: bring result_next HIGH hold it there for one clock cycle bring result_next LOW grab and stash the byte on output pins

If the hash is going to be, say “90fc0a268f8b81b…”, they’ll be present in that order 0x90, then 0xfc, then 0x0a etc

IO

# Input Output Bidirectional
0 data_input bit 0 result_byte bit 0 OUTPUT, result_ready
1 data_input bit 1 result_byte bit 1 OUTPUT, begin processing data debug
2 data_input bit 2 result_byte bit 2 INPUT, parallel loading enable
3 data_input bit 3 result_byte bit 3 INPUT, result_next
4 data_input bit 4 result_byte bit 4 OUTPUT, busy
5 data_input bit 5 result_byte bit 5 OUTPUT, processing data block debug
6 data_input bit 6 result_byte bit 6 INPUT, start new digest
7 data_input bit 7 result_byte bit 7 INPUT, clockin_data

Chip location

Controller Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux tt_um_chip_rom (Chip ROM) tt_um_factory_test (TinyTapeout 05 Factory Test) tt_um_loopback (TinyTapeout 05 Loopback Test Module) tt_um_Leaky_Integrate_Fire_nfjesifb (Leaky Integrate and Fire Neuron Model) tt_um_topModuleKA (Time Multiplexed Neuron Ckt) tt_um_sap_1 (SAP-1 Computer) tt_um_lif (Leaky Integrate-and-Fire Neuron (Verilog Demo)) tt_um_jleugeri_ticktocktokens (TickTockTokens) tt_um_LSNN (Spiking LSTM Network) tt_um_if (Integrate-and-Fire Neuron. (Verilog Demo)) tt_um_mihailocode_neural_network (Neural network on chip) tt_um_hls_lfi (Simple Leaky Integrate and Fire (LIF) Neuron) tt_um_diadatp_spigot_e (e Spigot) tt_um_kskyou (Continued Fraction Calculator) tt_um_wokwi_380005495431181313 (Water Pump Controller) tt_um_EventFilter (Event Denoising Circuit) tt_um_wokwi_380416099936681985 (7 segment seconds (Verilog Demo)) tt_um_freq_hcohensa (Frequency Encoder/Decoder) tt_um_wokwi_380410498092232705 (UART Greeter with RNN Module) tt_um_wokwi_380120751165092865 (WS2812B LED strip driver) tt_um_wokwi_380408486941145089 (Tiny Tapeout 5 Workshop) tt_um_wokwi_380409169798008833 (Tiny Tapeout 1) tt_um_wokwi_380409488188706817 (Supercon Workshop) tt_um_matrix_multiplier (Matrix Multiplier) tt_um_wokwi_380408594272345089 (Clock Divider) tt_um_wokwi_380408784463076353 (Binary Counter) tt_um_wokwi_380408396356749313 (ring osc test) tt_um_7segx4_clock_abhishek_top (7 segment clock with 4 digits) tt_um_wokwi_380409481852161025 (test001) tt_um_hodgkin_huxley (Hodgkin-Huxley Chip Design) tt_um_wokwi_380408823952452609 (Character Selector) tt_um_wokwi_380409904919056385 (Intructouction to PRBS) tt_um_wokwi_380409081067502593 (tto5 Supercon Project) tt_um_jmadden173_delta_modulation (Delta Modulation Spike Encoding) tt_um_wokwi_380409086743445505 (GameOfLife) tt_um_reflex_game (Reflex Game) tt_um_wokwi_380409019830656001 (Logic Gates Tapeout) tt_um_Fiona_CMU (Stream Cipher w/ LSR) tt_um_wokwi_380409532780455937 (tt5modifyd) tt_um_alu_chip (ALU Chip) tt_um_wokwi_380408936929183745 (Tapeout Test) tt_um_rjmorgan11_calculator_chip (Calculator chip) tt_um_wokwi_380409369220404225 (Shifty Snakey) tt_um_synth_GyanepsaaS (Synth) tt_um_wokwi_380408774591779841 (Sawtooth Generator) tt_um_wokwi_380197591775930369 (Blinking A) tt_um_wokwi_380409393770716161 (Supercon 2023) tt_um_mvm (Sparsity Aware Matrix Vector Multiplication) tt_um_wokwi_380408455148316673 (Ring Oscillator and Clock Source Switch) tt_um_mv_mult_alrdelcr (Matrix Vector Multiplication (Verilog Demo)) tt_um_wokwi_380416361853146113 (IDK WHAT TO DO) tt_um_wokwi_379319062779062273 (7-segment display logic system ) tt_um_wokwi_380409568391147521 (Hardware Trojan Example) tt_um_wokwi_379824923824476161 (Analog Clock) tt_um_wokwi_380145600224164865 (7 segment display) tt_um_wokwi_379889284755158017 (W_Li_10/28) tt_um_wokwi_380408409844584449 (Supecon Gate Play) tt_um_manjushettar (ECE 183 - Integrate and Fire Network Chip Design) tt_um_wokwi_380409236812508161 (tto5) tt_um_rebel2_balanced_ternary_ALU (REBEL-2 Balanced Ternary ALU) tt_um_wokwi_380229599886002177 (Stochastic Multiplier) tt_um_jeffdi_seven_segment_seconds (7 segment seconds - count down) tt_um_wokwi_380416616536542209 (TT05 Submission) tt_um_lif_n (Leaky Integrate-and-Fire Neuron) tt_um_wokwi_379764885531576321 (Count via LFSR) tt_um_dlmiles_tt05_i2c_bert (I2C BERT) tt_um_loopback_ericsmi (tt05-loopback tile with input skew measurement) tt_um_flappy_vga_cutout1 (Flappy VGA) tt_um_async_proc_paulschulz (Asynchronous Parallel Processor Demonstrator) tt_um_wokwi_380055891603379201 (Hex Countdown) tt_um_nickjhay_processor (Matrix multiply coprocessor (8x8 1bit)) tt_um_htfab_cell_tester (Standard cell generator and tester) tt_um_wta (Winner-Take-All Network (Verilog Demo)) tt_um_muncherkin_lioncage (Lion cage) tt_um_seven_segment_seconds_ksandov4 (Brain Inspired Random Dropout Circuit) tt_um_seanvenadas (Event-Based Denoising Circuit) tt_um_wokwi_378231665807713281 (RAM cell test) tt_um_rejunity_ay8913 (Classic 8-bit era Programmable Sound Generator AY-3-8913) tt_um_rnn (RNN (Demo)) tt_um_gharenthi_top (STDP Neuron) tt_um_SNN (Basic Spiking Neural Network) tt_um_btflv_8bit_fp_adder (8 bit floating point adder) tt_um_perceptron (Perceptron Hardcoded) tt_um_jkprz (Cheap and quick STDP) tt_um_topLevel_derekabarca (Brain-Inspired Oscillatory Network) tt_um_uwuifier (UART uwuifier) tt_um_perceptron_connorguzi (Perceptron and basic binary neural network) tt_um_hadirkhan10_lif_neuron (Leaky Integrate-and-Fire Neuron) tt_um_wokwi_380119282165535745 (7 segment seconds) tt_um_uabc_electronica_2023 (UABC-ELECTRONICA) tt_um_proppy_bytebeat (bytebeat) tt_um_meriac_play_tune (Super Mario Tune on A Piezo Speaker) tt_um_wrapper_inputblackboxoutput (Byte Computer) tt_um_vhdl_seven_segment_seconds (7 segment seconds (VHDL Demo)) tt_um_cejmu (4-Bit ALU) tt_um_rejunity_sn76489 (Classic 8-bit era Programmable Sound Generator SN76489) tt_um_minipit_stevej (Miniature Programmable Interrupt Timer) tt_um_gchenfc_seven_segment_gerry (7-segment Name Display) tt_um_supertails_tetris (Tetris) tt_um_mabhari_seven_segment_seconds (Simple_Timer-MBA) tt_um_njzhu_uart (UART Receiver) tt_um_wokwi_376553022662786049 (AGL CorticoNeuro-1) tt_um_adaptive_lif (Leaky-Integrated Fire Neuron) tt_um_myuart (MyUART) tt_um_wokwi_380438365946734593 (UART test) tt_um_tkmheart (Heart Rhythm Analyzer) tt_um_stdp (Spike-timing dependent plasticity (Verilog Demo)) tt_um_wokwi_380465686251921409 (Tiny Tapeout 5 TM project1) tt_um_thermocouple (Thermocouple-to-temperature converter (digital backend)) tt_um_wokwi_380412382001715201 (Naive 8-bit Binary Counter) tt_um_asinghani_tinyscanchain_tt05 (tinyscanchain Test Design) tt_um_carlosgs99_cro_udg (6 digit chronometer.) tt_um_suhrojo (Convolutional Network Circuit Chip Design) tt_um_mvm_ (Matrix Vector Multiplication Accelerator) tt_um_perceptron_neuromeme (Perceptron (Neuromeme)) tt_um_czlucius_alu (4 Bit ALU) tt_um_BNNNeuron (Binary Neural Network (Verilog Demo)) tt_um_urish_skullfet (SkullFET) tt_um_ja1tye_sound_generator (Wavetable Sound Generator) tt_um_jaylennee_wta_pwm (PWM signal generation with Winner-Take-All selection) tt_um_joerdsonsilva_top (Multimode Modem) tt_um_toivoh_synth (Analog emulation monosynth) tt_um_tiny_game_of_life (Tiny Game of Life) tt_um_mingkaic1_stack_machine (Stack Machine) tt_um_morningjava_top (ChipTune) tt_um_urish_silife (Game of Life 8x8 (siLife)) tt_um_tt05_analog_test (TT05 Analog Testmacro (Ringo, DAC)) tt_um_wokwi_380409528895479809 (RBUART) tt_um_mattngaw_fp8 (8-bit Floating-Point Adder) tt_um_chip_inventor_music__6_bit_count (6 bit Counter and Piano Music created by Chip Inventor) tt_um_4_bit_pipeline_multiplier (4 Bit Pipelined Multiplier) tt_um_wokwi_380477805171811329 (2-Bit ALU + Dice) tt_um_wokwi_380490286828784641 (TT02 Wokwi 7seg remake) tt_um_wokwi_380361576213660673 (ping pong asic) tt_um_prg (A Boolean function based pseudo random number generator (PRNG)) tt_um_digital_clock_sellicott (Digital Desk Clock) tt_um_haozhezhu_top (4-bit FIFO/LIFO) tt_um_top_mole99 (One Sprite Pony) tt_um_GrayCounter_ariz207 (4 bit Sync Gray Code Counter) tt_um_clkdiv (Clock and Random Number Gen) tt_um_matt_divider_test (TT05 Analog Test) tt_um_tomkeddie_a (VGA Experiments) tt_um_rejunity_rule110 (Rule110 cell automata) tt_um_no_time_for_squares_tommythorn (No Time for Squares) tt_um_urish_silife_max (Game of Life 8x32 (siLife)) tt_um_gfg_development_tros (TROS) tt_um_chatgpt_snn_mtomlin5 (ChatGPT designed Spiking Neural Network) tt_um_ks_pyamnihc (Karplus-Strong String Synthesis) tt_um_dinogame (VGA Dino Game) tt_um_himanshu5_prog_chipTop (Dual Compute Unit) tt_um_rtfb_collatz (Collatz conjecture brute-forcer) tt_um_retospect_neurochip (Field Programmable Neural Array) tt_um_urish_dffram (DFFRAM Example (128 bytes)) tt_um_rejunity_snn (Chonky SNN) tt_um_hh (Hodgkin-Huxley Neuron) tt_um_wokwi_377426511818305537 (PRBS Generator) tt_um_devinatkin_stopwatch (Stop Watch) tt_um_algofoogle_vga_spi_rom (vga_spi_rom) tt_um_blink (RO and counter) tt_um_ttl74hc595_v2 (8-Bit Shift Register with Output Latches 74HC595) tt_um_psychogenic_neptuneproportional (Neptune guitar tuner (proportional window, version b, debug output on bidir pins, larger set of frequencies)) tt_um_urish_simon (Simon Says game) tt_um_kianV_rv32ima_uLinux_SoC (KianV uLinux SoC) tt_um_urish_ringosc_cnt (Ring oscillator with counter) tt_um_sunaofurukawa_cpu_8bit (cpu_8bit) tt_um_vga_clock (VGA clock) tt_um_seven_segment_seconds (7 segment seconds (Verilog Demo)) tt_um_frequency_counter (Frequency counter) tt_um_rgb_mixer (RGB Mixer) tt_um_MichaelBell_spi_peri (SPI Peripheral) tt_um_multiplexed_clock (Multiplexed clock) tt_um_psychogenic_shaman (Shaman: SHA-256 hasher) tt_um_yubex_metastability_experiment (metastability experiment) Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available