With integer division, comparison, subtraction already implemented, it was possible to squeeze in integer square root as the algorithm can reuse these operations. Note:
- all values in TOS are treated as positive (unsigned) integers, so it will never generate an error (root of negative numbers)
- taking the root of full 32-bit numbers is possible
- CPU has only 2 counters - bitcnt and loopcnt - the third "level of loop" does not use a counter in the CPU, but simply comparison with value executed in previous iteration
The algorithm uses Heron's method described here. For implementation reference, see to microcode lines 293 to 322 approx.
- First, argument in TOS ("S") is duplicated on the stack, dup() subroutine is used for this
- TOS is then shifted down (halved)
- If zero flag of TOS is set, we are trying isqrt(0) so bail with result 0 (but after 1 stack pop to not leave already duplicated argument in NOS)
- Note that 1 shifted down is also 0, which would give false result isqrt(1) = 0 - this is caught in div() subroutine by examining the delay status bit
- Now the stack is prepared to contain: X0, S, *, *, *, *, X0 - where X0 = S/2
- heron_step() is invoked to calculate the first iterative guess ("X1") which appears in TOS upon return
- root_loop: new "X1" and old "X0" iteration results are compared. This boils down to comparison between TOS and R7, which is a simple subtract with result not stored anywhere, just carry bit examined
- if carry flag is set (X1 >= X0), iterations are over and result in R7 is copied to TOS, bail.
- root_cont: "new" X1 value becomes "old" X0 (which means R7 <= TOS)
- new X1 guess is obtained using heron_step() and proceed to step 7.
heron_step() subroutine implements the following calculation:
x1 = ( x0 + s / x0 ) / 2;
which is:
TOS <= (NOS + DIV(NOS, R7)) >> 1
// used as step in integer square root
heron_step: nuke, matrix_nop1();
connect row_r7 to col_tos, div2(max, set); // X0, S, S, ?, ?, ?, ?, X0
divmod(same); // mod, div, S, ?, ?, ?, ?, X0
nuke, matrix_nop1();
disconnect row_nos from col_nos;
connect row_r2 to col_nos; // NOS <= R2 (S)
connect row_nos to col_adc1;
connect row_r7 to col_adc2; // TOS <= R7 + NOS
connect row_sum to col_tos, c_flag <= zero, div2(max, set);
// NOTE terrible fall-through!!
// shift TOS >> 1
shift_down: prep_shift();
shift_dloop: STATUS = busy_using_mt, opr = m2_m2_np, z_flags <= update, d_flag <= column, if bitcnt_is_zero then return;
bitcnt <= dec, goto shift_dloop;
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.