902 Systolic Binary Neural Network Accelerator :: Quicker, easier and cheaper to make your own chip!

902 : Systolic Binary Neural Network Accelerator

Author: Dean Foulds

Description: V2 systolic 16-neuron BNN accelerator. XNOR dot product engine, hardware feature expansion, signed bias, balanced popcount tree. 8-cycle systolic compute reuses hardware for smaller silicon footprint.

GitHub repository

Open in 3D viewer

Clock: 10000000 Hz

Systolic Binary Neural Network Accelerator

What is this chip?

This chip implements a 16-neuron Binary Neural Network (BNN) inference accelerator in silicon. It classifies an 8-bit input vector using 16 independently programmable neurons, each with learnable binary weights and a signed bias. The design is inspired by the systolic array architecture invented by H.T. Kung at Carnegie Mellon University in 1978, and implements the XNOR-popcount computation that is the standard in modern BNN research.

Architecture

How it works

Each neuron computes a dot product between its weight vector and a set of derived input features, then fires if the result exceeds a learned bias:

feat    = feature_expand(ui_in)
S[n]    = popcount( XNOR( weights[n], feat ) )
y[n]    = 1   if   S[n] + bias[n] >= 0
y[n]    = 0   otherwise

The compute engine processes one bit per clock cycle over 8 cycles (systolic), reusing hardware rather than duplicating it 16 times. This reduces silicon area significantly compared to a fully parallel design.

See the detailed documentation:

Pin mapping

Pin	Direction	Function
`clk`	in	System clock
`rst_n`	in	Active-low reset
`ui_in[7:0]`	in	Input features (infer) or load data (load)
`uio_in[0]`	in	Mode: 0=load, 1=infer
`uio_in[1]`	in	Target: 0=weights, 1=bias
`uio_in[5:2]`	in	Neuron select 0–15
`uo_out[7:0]`	out	Fire signals neurons 0–7
`uio_out[7:0]`	out	Fire signals neurons 8–15

How to test

Load weights for neuron n

Set uio_in[0]=0, uio_in[1]=0, uio_in[5:2]=n
Set ui_in[7:0] = 8-bit weight pattern
Pulse clock

Load bias for neuron n

Set uio_in[0]=0, uio_in[1]=1, uio_in[5:2]=n
Set ui_in[3:0] = bias magnitude, ui_in[4] = sign (1=negative)
Pulse clock

Run inference

Set uio_in[0]=1
Set ui_in[7:0] = input feature vector
After 8 clock cycles read uo_out and uio_out

Train weights in Python

from perceptron_trainer import train_perceptron, generate_load_instructions
import numpy as np

X = np.random.randint(0, 2, (200, 8))
y = (X.sum(axis=1) > 4).astype(int)
weights, bias, _ = train_perceptron(X, y, epochs=100)
generate_load_instructions(weights, bias)

External hardware

No external hardware required. A Raspberry Pi or Arduino can load trained weights and run inference via the ui_in and uio_in pins. See info_v2.md for full Raspberry Pi wiring and Python code.

End

#	Input	Output	Bidirectional
0	x0 - input feature bit 0	fire0 - neuron 0 output	mode - 0=load 1=infer
1	x1 - input feature bit 1	fire1 - neuron 1 output	target - 0=weights 1=thresholds
2	x2 - input feature bit 2	fire2 - neuron 2 output	sel0 - neuron select bit 0
3	x3 - input feature bit 3	fire3 - neuron 3 output	sel1 - neuron select bit 1
4	x4 - input feature bit 4	fire4 - neuron 4 output	sel2 - neuron select bit 2
5	x5 - input feature bit 5	fire5 - neuron 5 output	sel3 - neuron select bit 3
6	x6 - input feature bit 6	fire6 - neuron 6 output	fire8 - neuron 8 output
7	x7 - input feature bit 7	fire7 - neuron 7 output	fire9 - neuron 9 output

Input

Output

Bidirectional

x0 - input feature bit 0

fire0 - neuron 0 output

mode - 0=load 1=infer

x1 - input feature bit 1

fire1 - neuron 1 output

target - 0=weights 1=thresholds

x2 - input feature bit 2

fire2 - neuron 2 output

sel0 - neuron select bit 0

x3 - input feature bit 3

fire3 - neuron 3 output

sel1 - neuron select bit 1

x4 - input feature bit 4

fire4 - neuron 4 output

sel2 - neuron select bit 2

x5 - input feature bit 5

fire5 - neuron 5 output

sel3 - neuron select bit 3

x6 - input feature bit 6

fire6 - neuron 6 output

fire8 - neuron 8 output

x7 - input feature bit 7

fire7 - neuron 7 output

fire9 - neuron 9 output

Chip location

902 Systolic Binary Neural Network Accelerator