A recent project required a 32bit by 16bit division. My previous default method was that by Hemsley (http://www.piclist.com/techref/microchip/math/div/32by16ph.htm ), which has the advantages of relatively few instructions and being generally applicable to any problem, but it takes time (average cycles for test problem set is 682). The fastest method I found was by Golovchenko (http://www.piclist.com/techref/microchip/math/div/div16or32by16to16.htm ). It takes about 350Tcy, but is limited to quotients of 16bits or less.
I became attracted to the socalled Kenyan (or Russian) method that doubles the divisor until it is just larger than the dividend. That is followed by sequential subtractions by the divisor. After each subtraction, the divisor is divided by 2. A subtraction without borrow results in a 1 being rotated into the quotient, and one with borrow results in a zero being rotated into the quotient. See here for a better description: ReichbornKjennerud ( http://www.piclist.com/techref/method/math/muldiv.htm ).
Working with that description, I got some procedures that worked, were a little faster than Hemsley, but were lengthy. Then using indirect addressing, I shortened the code a little for a small cost in time.
Presently, I have two working algorithms. The slower one just uses byte shifts to initiate the sequential subtractions but has fewer instructions. The faster one manipulates bits in the divisor and takes slightly more code. Timings for the same problem set used with the Hemsley method average 498 and 482 Tcy, respectively. I have attached the byteshift method as a file. I have not tried to optimize it yet.
My question:
My test problem set is 12 problems. They are not randomly picked. Division by small numbers tends to be slowest, as expected from the algorithm. Is anyone here willing to do 3 of those test divisions in C just to see how close I am to what is standard. I have no property interest in either method I wrote. They are pretty much bluntforce approaches. Of course, I will also share the whole problem set, but the three problems I have picked are pretty representative:
1) 0x9C40000 ÷ 0x1F4 ; expected answer = 0x50000, bytemethod = 573 Tcy
2) 0x9C40000 ÷ 0x1388 ; expected answer = 0x8000, bytemethod = 443 Tcy
3) 0xC40000 ÷ 0x3100 ; expected answer = 0x400, bytemethod = 394 Tcy
Thanks in advance.
John
I became attracted to the socalled Kenyan (or Russian) method that doubles the divisor until it is just larger than the dividend. That is followed by sequential subtractions by the divisor. After each subtraction, the divisor is divided by 2. A subtraction without borrow results in a 1 being rotated into the quotient, and one with borrow results in a zero being rotated into the quotient. See here for a better description: ReichbornKjennerud ( http://www.piclist.com/techref/method/math/muldiv.htm ).
Working with that description, I got some procedures that worked, were a little faster than Hemsley, but were lengthy. Then using indirect addressing, I shortened the code a little for a small cost in time.
Presently, I have two working algorithms. The slower one just uses byte shifts to initiate the sequential subtractions but has fewer instructions. The faster one manipulates bits in the divisor and takes slightly more code. Timings for the same problem set used with the Hemsley method average 498 and 482 Tcy, respectively. I have attached the byteshift method as a file. I have not tried to optimize it yet.
My question:
My test problem set is 12 problems. They are not randomly picked. Division by small numbers tends to be slowest, as expected from the algorithm. Is anyone here willing to do 3 of those test divisions in C just to see how close I am to what is standard. I have no property interest in either method I wrote. They are pretty much bluntforce approaches. Of course, I will also share the whole problem set, but the three problems I have picked are pretty representative:
1) 0x9C40000 ÷ 0x1F4 ; expected answer = 0x50000, bytemethod = 573 Tcy
2) 0x9C40000 ÷ 0x1388 ; expected answer = 0x8000, bytemethod = 443 Tcy
3) 0xC40000 ÷ 0x3100 ; expected answer = 0x400, bytemethod = 394 Tcy
Thanks in advance.
John
Attachments

4.3 KB Views: 37