A recent project required a 32-bit by 16-bit division. My previous default method was that by Hemsley ( ). It takes about 350Tcy, but is limited to quotients of 16-bits or less.
I became attracted to the so-called Kenyan (or Russian) method that doubles the divisor until it is just larger than the dividend. That is followed by sequential subtractions by the divisor. After each subtraction, the divisor is divided by 2. A subtraction without borrow results in a 1 being rotated into the quotient, and one with borrow results in a zero being rotated into the quotient. See here for a better description: Reichborn-Kjennerud ( ).
Working with that description, I got some procedures that worked, were a little faster than Hemsley, but were lengthy. Then using indirect addressing, I shortened the code a little for a small cost in time.
Presently, I have two working algorithms. The slower one just uses byte shifts to initiate the sequential subtractions but has fewer instructions. The faster one manipulates bits in the divisor and takes slightly more code. Timings for the same problem set used with the Hemsley method average 498 and 482 Tcy, respectively. I have attached the byte-shift method as a file. I have not tried to optimize it yet.
My question:
My test problem set is 12 problems. They are not randomly picked. Division by small numbers tends to be slowest, as expected from the algorithm. Is anyone here willing to do 3 of those test divisions in C just to see how close I am to what is standard. I have no property interest in either method I wrote. They are pretty much blunt-force approaches. Of course, I will also share the whole problem set, but the three problems I have picked are pretty representative:
1) 0x9C40000 ÷ 0x1F4 ; expected answer = 0x50000, byte-method = 573 Tcy
2) 0x9C40000 ÷ 0x1388 ; expected answer = 0x8000, byte-method = 443 Tcy
3) 0xC40000 ÷ 0x3100 ; expected answer = 0x400, byte-method = 394 Tcy
Thanks in advance.
John
I became attracted to the so-called Kenyan (or Russian) method that doubles the divisor until it is just larger than the dividend. That is followed by sequential subtractions by the divisor. After each subtraction, the divisor is divided by 2. A subtraction without borrow results in a 1 being rotated into the quotient, and one with borrow results in a zero being rotated into the quotient. See here for a better description: Reichborn-Kjennerud ( ).
Working with that description, I got some procedures that worked, were a little faster than Hemsley, but were lengthy. Then using indirect addressing, I shortened the code a little for a small cost in time.
Presently, I have two working algorithms. The slower one just uses byte shifts to initiate the sequential subtractions but has fewer instructions. The faster one manipulates bits in the divisor and takes slightly more code. Timings for the same problem set used with the Hemsley method average 498 and 482 Tcy, respectively. I have attached the byte-shift method as a file. I have not tried to optimize it yet.
My question:
My test problem set is 12 problems. They are not randomly picked. Division by small numbers tends to be slowest, as expected from the algorithm. Is anyone here willing to do 3 of those test divisions in C just to see how close I am to what is standard. I have no property interest in either method I wrote. They are pretty much blunt-force approaches. Of course, I will also share the whole problem set, but the three problems I have picked are pretty representative:
1) 0x9C40000 ÷ 0x1F4 ; expected answer = 0x50000, byte-method = 573 Tcy
2) 0x9C40000 ÷ 0x1388 ; expected answer = 0x8000, byte-method = 443 Tcy
3) 0xC40000 ÷ 0x3100 ; expected answer = 0x400, byte-method = 394 Tcy
Thanks in advance.
John