326 QOA Decoder

326 : QOA Decoder

Design render
  • Author: Nicholas West
  • Description: A decoder for the QOA audio format.
  • GitHub repository
  • Clock: 30303030 Hz

How it works

This chip is for decoding the QOA audio format, which is designed to be a simple, fast format for 16 bit PCM audio data. The specification is one page, and is availible at qoaformat.org. The chip communicates through an SPI slave mode 0 interface to a controller chip, which handles the file interface and all adjecent functions. The chip only handles decoding samples into their 16 bit uncompressed versions.

Block diagram

A diagram showing the internal structure of the chip

The chip itself consists of two main parts, an SPI interface for communication, and the decoder itself. The decoder contains a parser for the SPI data, the LMS predictor/updater at the heart of the QOA format, and the history/weights for the LMS predictor. For die area savings, we use a sequential multiplier in the LMS predictor, and save on the expensive dequantizing computations by using a precalculated table from the reference code on Github. We save space further by only saving half the values, since every odd index is just the negative counterpart of the previous value, and we can just flip the sign.

The decoder itself

The decoder has three main parts, registers for the LMS history and weights, a parser for handling SPI data, and the QOA decoder in the parser.

A block diagram of the decoder itself, showing the components and how they are linked

First off, whenever data_rdy is pulsed, the main state machine in the parser decodes spi_in into either a hist/weights fill instruction, a sample decode instruction, or a sample send instruction.

  • If the instruction is a hist/weights fill, it takes the next 2 bytes (i.e. 2 data_rdy pulses) and puts them into the history or weights registers specified by the index in the instruction.
  • If the instruction is a sample send request, the parser will set the spi_out register to the upper bit of sample, wait for it to transmit (8 SPI clock cycles, so a data_rdy pulse), then set it to the low byte and finish the transmission.
  • Finally, if it is a sample decode instruction, it will iteratively multiply the history and weights values using a sequential multiplier, adding them to an accumulator. It then uses combinational logic and the ROM to calculate the final sample. This is then used to update history, weights, and is sent if a sample send request is recived.

How to test

Connect the chip to a mode 0 SPI master, with a clock rate at least 6x slower than the chip clock. Then, fill the LMS history and weights, by using the following instruction:

bit[7] bit[6] bit[5] bit[4] bit[3] bit[2] bit[1] bit[0]
0 Adress[1] Adress[0] BankSel 0

BankSel chooses between history and weights, 1 for weights and 0 for history. Adress is just which of the 4 values to fill, as specified by QOA. The next two bytes are the data to fill the history or weights with, MSB first. If you want to then send a sample, the following instruction is used:

bit[7] bit[6] bit[5] bit[4] bit[3] bit[2] bit[1] bit[0]
sf_quant[3] sf_quant[2] sf_quant[1] sf_quant[0] qr[2] qr[1] qr[0] 1

qr and sf_quant are exactly as they are in the QOA specification, with this chip decoding sample by sample.

After sending the sample, wait 40 chip clock cycles, then request the sample with the following instruction:

bit[7] bit[6] bit[5] bit[4] bit[3] bit[2] bit[1] bit[0]
1 0

Once you send that instruction, the next two bytes sent by the chip will be the decoded sample, MSB first. While you are reciving the sample, you can send any data, but it will be ignored. The chip will send unknown data when the instruction is not used.

On the testbench, it can calculate one sample every 5680ns, SPI transfer included, at a clock speed of 50MHz and an SPI frequency of 8MHz, which should be achivable on hardware. This can likely be improved by using a custom/different approach to SPI, since my test bench leaves relatively long periods of inactivity. The current speed results in a max of 176,056 samples per second, more than enough for real time audio streaming.

Eventually I will get arould to writing code for the interface on my Github, please look back there for updates.

External hardware

Since this is a co-processor for the QOA format, a seperate microcontroller is required to interface with it. Since I am used to the RP2040 and it is included on the Tiny Tapeout PCB, I will likely provide software for it on my Github in the future. I plan to take a streaming approach with this software, so a PC supporting USB will also be needed to send, store, and convert the files.

IO

#InputOutputBidirectional
0CS
1MOSI
2MISO
3SCK
4
5
6
7

Chip location

Controller Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Analog Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux Analog Mux Mux Mux Mux Mux Mux Mux Mux Mux Mux tt_um_chip_rom (Chip ROM) tt_um_factory_test (TinyTapeout 7 Factory Test) tt_um_analog_factory_test (TT07 Analog Factory Test) tt_um_urish_charge_pump (Dickson Charge Pump) tt_um_adennen_inverter (Aron's analog buffer test) tt_um_rejunity_z80 (Zilog Z80) tt_um_kianv_bare_metal (KianV RISC-V RV32E Baremetal SoC) tt_um_macros77_subneg (SUBNEG CPU) tt_um_eater_8bit (Tiny Eater 8 Bit) tt_um_ender_clock (clock) tt_um_wokwi_397140982440144897 (7-Seg 'Tiny Tapeout' Display) tt_um_wokwi_397142450561071105 (Padlock) tt_um_Burrows_Katie (QIF Neuron) tt_um_vga_clock (VGA clock) tt_um_aidenfoxivey (CRC-8 CCITT) tt_um_PUF (Reversible logic based Ring-Oscillator Physically Unclonable Function (RO-PUF)) tt_um_devinatkin_dual_oscillator (dual oscillator) tt_um_urish_simon (Simon Says memory game) tt_um_ajstein_stopwatch (Stopwatch Project) tt_um_rnunes2311_12bit_sar_adc (12 bit SAR ADC) tt_um_DanielZhu123 (calculator) tt_um_wokwi_397268065185737729 (Mini Light Up Game) tt_um_toivoh_basilisc_2816 (Basilisc-2816) tt_um_MichaelBell_rle_vga (RLE Video Player) tt_um_wokwi_397774697322214401 (secret L) tt_um_ccattuto_charmatrix (Serial Character LED Matrix) tt_um_The_Chairman_send_receive (Send Receive) tt_um_mini_aie_2x2 (mini-aie-cgra) tt_um_twin_tee_opamp_osc (Twin Tee Sine Wave Generator) tt_um_brucemack_sb_mixer (Single Balanced Mixer) tt_um_revenantx86_tinytpu (TinyTPU) tt_um_chess (Chess) tt_um_vga_perlin (VGA Perlin Noise) tt_um_calonso88_74181 (ALU 74181) tt_um_tinytapeout_dvd_screensaver (DVD Screensaver with Tiny Tapeout Logo (Tiny VGA)) tt_um_TD4_Assy_KosugiSubaru (4bit_CPU_td4) tt_um_drburke3_top (FastMagnitudeComparator) tt_um_pongsagon_tiniest_gpu (Tiniest GPU) tt_um_jorga20j_prng (8 bit PRNG) tt_um_ejfogleman_smsdac8 (8-bit DEM R2R DAC) tt_um_ccattuto_conway (Conway's Terminal) tt_um_fp_mac (FP-8 MAC Module) tt_um_router (router) tt_um_serdes (SerDes) tt_um_rejunity_analog_dac_ay8913 (AY-8193 single channel DAC) tt_um_riscv_spi_wrapper (RISCV32I with spi wrapper) tt_um_mos_bandgap (MOS Bandgap) tt_um_shadow1229_vga_player (VGA player) tt_um_explorer (Explorer) tt_um_rtmc_top_jrpetrus (Real Time Motor Controller) tt_um_28add11_QOAdecode (QOA Decoder) tt_um_toivoh_basilisc_2816_cpu_OL2 (Basilisc-2816) tt_um_afasolino (integer to posit converter and adder ) tt_um_8bit_vector_compute_in_SRAM (8-bit Vector Compute-in-SRAM) tt_um_lfsr (LFSR) tt_um_tnt_diff_rx (TT07 Differential Receiver test) tt_um_urish_spell (SPELL) tt_um_underserved (underserved) tt_um_dpetrisko_ttdll (TTDLL) tt_um_mitssdd (co processor for precision farming) tt_um_wokwi_399192124046955521 (ECC_test1) tt_um_dusterthefirst_project (Communicate 433) tt_um_xeniarose_sha256 (tiny sha256) tt_um_njp_micro (MicroCode Multiplier) tt_um_VishalBingi_r2r_4b (4-bit R2R DAC) tt_um_lisa (LISA Microcontroller with TTLC) tt_um_template (TT7 Simple Clock) tt_um_seanyen0_SIMON (SIMON) tt_um_agurrier_mastermind (Mastermind) tt_um_KolosKoblasz_mixer (Gilbert Mixer) tt_um_Saitama225_comp (Analog comparator) tt_um_tt7_meonwara (TBD) tt_um_multiplier_mbm (Modified Booth Multiplier) tt_um_delay_line_tmng (Delay Line Time Multiplexed NAND Gate) tt_um_mandelbrot_accel (Mandelbrot Set Accelerator (32-bit IEEE 754)) tt_um_dvxf_dj8v_dac (DJ8 8-bit CPU w/ DAC) tt_um_obriensp_pll (PLL Playground) tt_um_unisnano (unisnano) tt_um_alfiero88_CurrentTrigger (Current Mode Trigger) tt_um_CktA_InstAmp (Instrumentation Amplifier for Electrocardiogram Signal Adquisition) tt_um_lcasimon_tdc (Analog TDC) tt_um_neural_network (Neural Network dinamic) tt_um_PS_PWM (Phase Shifted PWM Modulator) tt_um_litneet64_ro_puf (RO-based Physically Unclonable Function (PUF)) tt_um_wokwi_399163158804194305 (Digital Timer) tt_um_algofoogle_raybox_zero (raybox-zero TT07 edition) tt_um_wokwi_399336892246401025 (UART) tt_um_wokwi_399169514887574529 (Gaussian Blur) tt_um_maheredia (GPS signal generator) tt_um_pwm_elded (UACJ_PWM) tt_um_wokwi_399447152724198401 (8-Bit Register) tt_um_adonairc_dda (DDA solver for van der Pol oscillator) tt_um_6bitaddr (6 bit addr) tt_um_btflv_subleq (Subleq CPU with FRAM and UART) tt_um_emern_top (badGPU) tt_um_wokwi_399469995038350337 (dEFAULt 2hAC) tt_um_mixed_signal_pulse_gen (mixed_signal_pulse_gen) tt_um_maxluppe_digital_analog (All Digital DAC and Analog Comparators) tt_um_pyamnihc_dummy_counter (Dummy Counter) tt_um_wokwi_399488550855755777 (My 9-year-old son made an 8-bit counter chip) tt_um_toivoh_basilisc_2816_cpu_exp (Basilisc-2816 Experimental) tt_um_htfab_fprn (Field Programmable Resistor Network) tt_um_rejunity_ay8913 (Classic 8-bit era Programmable Sound Generator AY-3-8913) tt_um_maxluppe_NIST (Four NIST SP 800-22 tests implementation) tt_um_vga_snake (VGA Snake Game) tt_um_cm_1 (GDS counter-measures experiment 1) tt_um_nurirfansyah_alits02 (Analog Test Circuit ITS 2) tt_um_analog_rf_readout_circuit (RF_peripheral_circuits) tt_um_jleightcap (fractran-tt) tt_um_wokwi_399518371950068737 (Full-adder out of a kmap) tt_um_davidparent_hdl (PRBS Generator) tt_um_adia_psu_seq_test (Adiabatic PSU sequencer test) tt_um_spacecat_chan_john_pong_the_second (John Pong The Second) tt_um_rajum_iterativeMAC (Iterative MAC) tt_um_asinghani_tinywspr (TinyWSPR) tt_um_thatoddmailbox (DuckCPU) tt_um_rejunity_vga (VGA Checkers) tt_um_8bitadder (Ripple Carry Adder 8 bit) tt_um_vzayakov_top (Pong-VGA) tt_um_pa1mantri_cdc_fifo (Clock Domain Crossing FIFO) Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available Available