I completed a simple 8-bit x 8-bit multiply function (iterative adds). I decided to do a quick benchmark to check its performance. Since this is an iterative add function, the worst case is FF x FF. It takes 1.61ms to perform with the clock running at 12MHz. So it takes 19,397 clock cycles to perform the operation. That's a lot of moving. If there's 256 loops, then it takes about 76 clocks per loop. There's currently 3 clocks per instruction, so about 25 instructions per loop. That sounds about right.
A better algorithm would speed this up like the shift-and-add algorithm (probably). But it's good to get a baseline. The number of loops is really the killer. A shift-and-add algorithm does I think 8 loops with an add and a shift right each loop. Of course, in the best case, my algorithm takes zero loops, so it's possible to be faster, but statistically I'm sure it's slower.
But now I can store this function in the ROM and call it whenever I need to do a multiply. However, I have it pointing at dedicated memory locations. I need to work on using the stack to pass values to functions instead.
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.