
This is an 8-bit, 2-stage pipelined CPU with a custom ISA inspired by ARM.
The CPU has two pipeline stages:
uo_out[6:0]. External memory drives ui_in with the instruction at that address. On the next rising edge the instruction is latched into the instruction register (IR).Because these two stages overlap, one instruction is being fetched while the previous one is executing. The effective throughput is one instruction per cycle when there are no stalls.
The CPU uses a Von Neumann (unified) memory model — instructions and data share the same 7-bit address bus (uo_out[6:0], 128 locations) and 8-bit data bus (ui_in). External memory must be word-addressable with a one-cycle read latency (combinational output, registered on the CPU's rising edge).
On a load (LDR) or store (STR), the CPU hardware stalls the pipeline for two cycles:
uo_out[6:0] switches from PC to the data address (Rs register value). Memory output is not yet valid.uo_out[6:0] holds the data address. Memory output is valid. For LDR, the value on ui_in is written to Rd. For STR, uio_out holds the store value and uo_out[7] (WE#) is driven low.After the two stall cycles, the pipeline resumes automatically. No software NOP padding is required.
Four general-purpose 8-bit registers: R0, R1, R2, R3.
Two condition flags, updated only by CMP:
All instructions are 8 bits wide. There are three formats:
R-type 0_ooo_dd_ss — register operations
| Mnemonic | Encoding (ooo) | Operation |
|---|---|---|
| ADD Rd, Rs | 000 | Rd = Rd + Rs |
| SUB Rd, Rs | 001 | Rd = Rd − Rs |
| AND Rd, Rs | 010 | Rd = Rd & Rs |
| OR Rd, Rs | 011 | Rd = Rd | Rs |
| MOV Rd, Rs | 100 | Rd = Rs |
| CMP Rd, Rs | 101 | Sets Z and C flags; no register write |
| LDR Rd, Rs | 110 | Rd = mem[Rs] |
| STR Rd, Rs | 111 | mem[Rs] = Rd |
I-type 10_dd_iiii — load immediate
| Mnemonic | Operation |
|---|---|
| MOVI Rd, #imm | Rd = zero_extend(imm[3:0]) |
The 4-bit immediate is zero-extended to 8 bits. Range: 0–15.
B-type 11_cc_oooo — conditional branch
| Mnemonic | Condition (cc) | Taken when |
|---|---|---|
| BEQ offset | 00 | Z = 1 |
| BNE offset | 01 | Z = 0 |
| BCS offset | 10 | C = 1 (no borrow, Rd >= Rs) |
| B offset | 11 | always |
The 4-bit signed offset is sign-extended. Branch target = (PC + 1) + 1 + offset, where PC+1 is the already-incremented fetch pointer. To branch to absolute address T from instruction at address N, use offset = T − N − 2. Range: −8 to +7 from the instruction after the branch.
When a branch is taken, the instruction in the fetch stage is flushed (one cycle bubble). No stall occurs for a not-taken branch.
| Pin | Direction | Description |
|---|---|---|
ui_in[7:0] |
Input | Data bus from external memory (instructions and load data) |
uo_out[6:0] |
Output | 7-bit address bus to external memory (PC during fetch, Rs during memory op) |
uo_out[7] |
Output | WE# — active-low write enable, asserted during STR memory cycle |
uio_out[7:0] |
Output | Store data bus (value of Rd during STR memory cycle) |
Run the cocotb testbench:
cd test && make -B
This runs 15 tests covering all instructions, pipeline stalls, branch conditions, and a STR→LDR round-trip. A waveform is written to test/tb.fst and can be opened in GTKWave or Surfer.
Connect an 8-bit SRAM (e.g. 23LC512 or IS61C256AH) to the Tiny Tapeout board:
CPU pin SRAM pin
uo_out[6:0] → address bus A[6:0] (SRAM's upper address bits A[n:7] tied to GND)
uo_out[7] → write enable WE# (active low, connect directly)
ui_in[7:0] ← data out (SRAM output → CPU input)
uio_out[7:0] → data in (CPU output → SRAM input)
GND → output enable OE# (tie low)
GND → chip enable CE# (tie low)
Pre-load the SRAM with your program using a microcontroller or programmer before asserting rst_n. On reset release the CPU begins executing from address 0.
Program layout guidelines:
The following program loads a value from memory, adds a constant, and stores the result:
addr 0: 10_01_1000 MOVI R1, #8 -- R1 = 8 (data address)
addr 1: 10_10_0011 MOVI R2, #3 -- R2 = 3 (addend)
addr 2: 0_110_00_01 LDR R0, R1 -- R0 = mem[8]
addr 3: 0_000_00_10 ADD R0, R2 -- R0 = R0 + 3
addr 4: 0_111_00_01 STR R0, R1 -- mem[8] = result
addr 5: 11_11_1110 B -2 -- halt (self-loop: offset=-2 → target=5)
addr 6: (unused)
addr 7: (unused)
addr 8: 0100_0010 -- data value
A byte-addressable 8-bit SRAM with combinational read and one write enable signal. The SRAM must:
uo_out[7] (WE#) is driven low (active-low, connect directly to SRAM WE#)| # | Input | Output | Bidirectional |
|---|---|---|---|
| 0 | DATA_IN0 | ADDR0 | STORE_DATA0 |
| 1 | DATA_IN1 | ADDR1 | STORE_DATA1 |
| 2 | DATA_IN2 | ADDR2 | STORE_DATA2 |
| 3 | DATA_IN3 | ADDR3 | STORE_DATA3 |
| 4 | DATA_IN4 | ADDR4 | STORE_DATA4 |
| 5 | DATA_IN5 | ADDR5 | STORE_DATA5 |
| 6 | DATA_IN6 | ADDR6 | STORE_DATA6 |
| 7 | DATA_IN7 | WE_N | STORE_DATA7 |