
This design implements a Sequential EML Engine with Operand Feedback — a compact hardware accelerator for computing exponential-logarithmic expressions using Mitchell's approximation. Instead of building massive expression trees in silicon, we use a single reusable EML unit and feedback loops to compute nested expressions over multiple cycles.

The results figure explains how a single EML block can be reused across cycles to build nested expressions.
Core Formula:
out = exp(x) - ln(y)
Where exp() and ln() are computed using Mitchell's fast approximations:
exp(x) = 2^(x / ln(2)) = 2^(x * 1.4427)ln(y) = log2(y) / log2(e) = log2(y) * 0.693This work is based on the techniques and evaluation presented in the paper at arXiv:2603.21852.
The core computation unit (eml_tile.v):
out = exp(x) - ln(y)0.5 → 32 (0.5 × 64), 1.0 → 64, 2.0 → 128Constants (Q6.6):
INV_LN2 = 92 (≈ 1.4375/1.4427)LN2 = 44 (≈ 0.6875/0.6931)The stateful wrapper (eml_feedback_cell.v):
Operation Modes (controlled by sel_x, sel_y):
| sel_x | sel_y | Mode | Computation |
|---|---|---|---|
| 0 | 0 | Feed-forward | out = eml(x_ext, y_ext) — single cycle |
| 1 | 0 | Iterate X | out_n = eml(out_{n-1}, y_ext) — reuse X result |
| 0 | 1 | Iterate Y | out_n = eml(x_ext, out_{n-1}) — reuse Y result |
| 1 | 1 | Cross-feedback | out_n = eml(out_{n-1}, out_{n-1}) — both operands from prev |
The control layer (eml_spi_wrapper.v):
Register Map:
Addr 0 (RW=0): Control register
[11:2] = reserved
[1] = sel_y (0=external Y, 1=feedback)
[0] = sel_x (0=external X, 1=feedback)
[2] = valid (write: pulse=1 to trigger; read: n/a)
Addr 1 (RW=0): X input register (signed Q6.6)
[11:0] = x_ext value
Addr 2 (RW=0): Y input register (unsigned Q6.6)
[11:0] = y_ext value
Addr 3 (RW=1): Result register (read-only)
[15:13] = reserved (read as 0)
[12] = overflow flag
[11:0] = result (signed Q6.6)
TinyTapeout wrapper (project.v):
uio_in[0] = MOSIuio_in[1] = SCLKuio_in[2] = CS_Nuio_out[0] = MISOExample: Computing a nested expression
Goal: Calculate f(x) = eml(eml(x, 1), 1) which represents computing:
temp = exp(x) - ln(1) = exp(x)f = exp(temp) - ln(1) = exp(exp(x))Cycle-by-cycle execution:
Cycle 1 — Load inputs and configure for feed-forward:
sel_x = 0, sel_y = 0 (use external inputs)
x_ext = x, y_ext = 1.0
Result: prev = eml(x, 1.0) = exp(x)
Cycle 2 — Switch to iterate mode (reuse X result):
sel_x = 1, sel_y = 0 (X from feedback, Y external)
y_ext = 1.0 (unchanged)
x_in = prev (from cycle 1) = exp(x)
Result: out = eml(exp(x), 1.0) = exp(exp(x)) ✓
Advantage: Single EML unit does the work of a 2-level tree → ~70% area savings vs. a pipeline.
Mitchell's approximation trades precision for speed:
Typical use cases:
For applications requiring higher precision, use external high-precision math libraries.
pip install -r test/requirements.txt
cd test
make -B
x, y, and suitable sel settings to realize the target expression| # | Input | Output | Bidirectional |
|---|---|---|---|
| 0 | unused | unused | spi_mosi |
| 1 | unused | unused | spi_sck |
| 2 | unused | unused | spi_cs_n |
| 3 | unused | unused | spi_miso |
| 4 | unused | unused | unused |
| 5 | unused | unused | unused |
| 6 | unused | unused | unused |
| 7 | unused | unused | unused |