# Fastest Multi-precision Division Routines?

Discussion in 'Microcontrollers' started by jpanhalt, Sep 16, 2017.

1. ### jpanhaltWell-Known MemberMost Helpful Member

Joined:
Jun 21, 2006
Messages:
6,076
Likes:
524
Location:
Cleveland, OH, USA
After several weeks and many interruptions, let me share the best program I came up with for 32 bits divided by 16 bits with up to a 32 bit quotient and a valid remainder. The program uses the enhanced mid-range PIC instruction set. It can easily be adapted to ordinary mid-range 18F devices. Two versions are attached. The longer, full version (Exp_8, approximately 86 instructions) includes substantial manipulation of operand bytes before calculation. The shorter alternative entry version (Exp_9 approximately 55 instructions) assumes the user can do the byte shifts before calling the routine.

Suggestions for Improvement are welcome.

1) The basic algorithm I used is called by a variety of names, from the catchy "Kenyan" or "Russian" to something more simple like subtract and divide (See: Post #1 for the history).

The more common algorithm for division involves shifting the dividend left into the "remainder" and subtracting the divisor (i.e., multiply and subtract). In both algorithms, when the subtraction is successful, a 1 is rotated into the result. When the subtraction fails (i.e., there is borrow), a 0 is rotated into the result. Both algorithms include a loop for generating the result.

Because of the pre-calculation, subtract-divide can easily reduce the number of loop repetitions. Multiply-subtract methods can also limit loop repetitions by shifting bytes and using other shortcuts. Some of those shortcuts are very clever (See: atferrari's posted above) and fast. However, those shortcuts introduce limitations on the size of the dividend and allowable size of the quotient.

Since subtraction generally destroys the minuend (i.e., the result is stored in the "f" register), failed subtractions create a need to restore that minuend or require performing a trial subtraction first using the w register. A third alternative is to just carry forward with the negative result and add, rather than subtract, at the next iteration. I am not the first to do that, but I did incorporate it in my procedure.

2) My main purpose was to develop a routine that was faster than a typical multiply-subtract routine and did not have the restrictions of the modified routines.

Outline of my method:
1) Use indirect addressing to identify non-zero bytes in both operands and then shift those bytes to avoid unnecessary rotations and subtractions.

2) Detect potential error sources during step #1, e.g., division by zero or dividend=0.

3) Use bit banging to further adjust the divisor and minimize unnecessary operations without adding restrictions.

4) Avoid tables unless they save code space.

5) Do not do trial subtraction or restore the minuend in the calculation loop.

1) Obviously, bit banging and byte shifting take time up front while saving time in the loop. It is a trade off. Notice that I only shift the divisor left to make it bigger than the dividend. One could also shift it right to reduce how much larger than the dividend it is. I tested both left and right shifts as well nibble manipulations to minimize time in the loop. The procedures a were a little faster, but the code also expanded. The small gain in speed was probably a reflection of the efficiency of the loop. In the end, I discarded both right shifts and nibble shifts as adding complexity without great gains in speed.

2) I believe byte shifts and delayed reconstruction of the dividend (minuend) can also be applied to multiply and subtract methods without adding restrictions to them.

3) In practical use, one will often know the actual size of the dividend or divisor and can simply load the registers appropriately without going through the machinations of indirect addressing. I call that "alternate" entry and post a version (Exp_9) with slight revisions that is a little faster and a lot shorter.

Attachments:
#1 Code for full algorithm using indirect addressing to adjust bytes and bits
#2 Shortened code for which the user will need to put the bytes in correctly aligned registers
#3 A table showing comparative speed results for the multiply and subtract method (called Rogers) and the full subtract and divide algorithm

Regards, John

#### Attached Files:

File size:
19.3 KB
Views:
20
File size:
4.9 KB
Views:
13
• ###### Exp_9_Alt_Entry_ETO Post.asm
File size:
3.7 KB
Views:
17
• Informative x 1