MichaelBell: Tested with linked script to verify all pairs of 4-bit inputs give the expected output, and that the pipelining works correctly with random inputs.
Did some further testing with fast clocks, worked at 146MHz at 1.8V core voltage, and 154MHz at 1.9V core voltage. Link for more details