Continue to Site

Welcome to our site!

Electro Tech is an online community (with over 170,000 members) who enjoy talking about and building electronic circuits, projects and gadgets. To participate you need to register. Registration is free. Click here to register now.

  • Welcome to our site! Electro Tech is an online community (with over 170,000 members) who enjoy talking about and building electronic circuits, projects and gadgets. To participate you need to register. Registration is free. Click here to register now.

Number Fun: 17-bit BIN2BCD

Not open for further replies.


Well-Known Member
Most Helpful Member
Sunday afternoons in Cleveland can be spent watching the Browns lose (0-7) or doing something just for fun. I chose the latter.

PicList has several programs for converting radix. One I particularly like for 16-bit to BCD is attributed to John Payson with refinements contributed by Scott Dattalo and others ( ). My "need" was prompted by a counter routine that prints a 5-digit decimal result. Of course, the display poops out at 65535, and it would be nice to scroll up to 99999.

I spent yesterday coding Mansheim's variation in MPASM along the lines of the Payson code, which combines isolation of the nibbles with calculation of the 5 registers needed: ones, tens, hund, thou, and tenK. My code works out to about 43 instructions prior to the "normalization" routine. Total Tcy for 0x1869F (dec 99999) was 312 Tcy.

I thought I would present it here for comments and improvements. Note that there ate two options for "DoA1". Option 1 is probably a little easier to understand, but destroys the binL register. Option 2 is from Payson/Dattalo and preserves the binL register. Steps are identical. One needs simple to comment out a "bra" to switch between the options. The majority of time is spent in "normalization" and suggestions for that wouldbe most helpful.

Regards, John

EDIT: I am not real happy with the way I handled a4 and may take another look at it.


  • 17BIN2BCD_ETO.asm
    4 KB · Views: 179
Last edited:
Sorry, I was not aware he did the 17-bit version. Which tutorial number is it?

Sorry, I was not aware he did the 17-bit version. Which tutorial number is it?
Tutorial 11... But! He only uses 16 bit... However, its virtually the same code.. His code states its from piclist!!
Coding those formulas will have superficial similarities, like a lot of addwf's and a few rotates. However, the 17-bit formulas are different (e.g., Payson's 16-bit formula in PicList uses a constant for the tenK). The tenK register appears in every digit-formula (e.g, ones, tens, etc.) and the odd multipliers cause a small complexity.

In any event, I could not find an already coded version for 17-bits in Assembly -- at least not a shorter one. It was something for the afternoon, and maybe others will find it helpful. It was all done on a spread sheet and just fell together once the first nibble was set. There are at least 5 other ways to do it. ;)

I have thought about submitting it to PicList to close the loop on Mansheim's work, but that list seems pretty dead, and I didn't want one of the clever contributors there -- if still alive -- to point out an obvious way to reduce 4 instructions to just one. I will be giving more thought to the tenK instructions and hope some others here will see a more compact solution.

Hi John,
Here is some code I found on the web many years ago. (I cant remeber where.) to convert 24 bit binary to BCD. It has less instructions than the code you posted but may take longer to execute as it loops.


               ;       rBin 4 Bytes
rBin       EQU   0x3D       ;       MSB
rBin1       EQU   0x3E
rBin2       EQU   0x3F
rBin3       EQU   0x40

               ;       rBCD 4 Bytes
rBCD       EQU   0x41       ;       MSB
rBCD1       EQU   0x42   
rBCD2       EQU   0x43   
rBCD3       EQU   0x44       ;       LSB


           BCF   STATUS,C
           MOVLW   D'24'       ; 24-Bits
           MOVWF   rBitCnt       ; Make cycle counter
           CLRF   rBCD       ; Clear result area
           CLRF   rBCD+1       ;
           CLRF   rBCD+2       ;
           CLRF   rBCD+3       ;

Loop           MOVLW   rBCD       ;Pointer to lowest BCD location
           MOVWF   FSR
           MOVLW   4
           MOVWF   rCnt

           MOVLW   0x33
           ADDWF   INDF,F       ; Add 0x33 to rBCD, rBCD1,rBCD2, or rBCD3 depending on the number of times through the loop.
           BTFSC   INDF,3       ; top bit of low nibble ?  (If the original nibble was 5 or greater after adding 3
                       ; will cause the top bit of nibble to be set)
           ANDLW   0xF0       ; Mask for bits 4 - 7  ( The "3" in the top nibble.)
           BTFSC   INDF,7       ; top bit of high nibble ? (If the original nibble was 5 or greater after adding 3
                       ; will cause the top bit of nibble to be set)
           ANDLW   0x0F       ; Mask for bits 0 - 3
           SUBWF   INDF,F       ;
           INCF   FSR,F       ; Inc pointer to next byte
           DECFSZ   rCnt,F       ; Decrement counter  (Count to 4)
           GOTO   BintoBCD1   ; four times round this loop

           RLF   rBin+2,F   ; Get another bit
           RLF   rBin+1,F   ;
           RLF   rBin+0,F   ;
           RLF   rBCD+3,F   ; Put it into BCD
           RLF   rBCD+2,F   ;
           RLF   rBCD+1,F   ;
           RLF   rBCD+0,F   ;
           DECFSZ   rBitCnt,F   ; All done? (Count to 24)
           GOTO   Loop       ; No, loop  (24 times round this loop)

I have used this code many times.

Hi Les,

Thanks for the addition. I will certainly save it. Never hurts to have more than one arrow in a quiver . There are some pretty short codes on PicList too, but loops can be time killers. For example, the "normalization" routine in what I posted looks short on paper but takes more than 6 times as much time as the first part.


I think the code you posted has the wonderful name of the Double Dabble algorithm.

Could you post the spreadsheet as well? Edit, I see the equations are in the code. Thanks.

P.S. not got time now to go through this now but will in the coming days.
Last edited:
Since you asked (before seeing the formulas), here's the spreadsheet:
For something like this I revert to the good old days when you could doodle while thinking. I also like to put it down for a day or two, them come back fresh. I will play with it a little more today. Since the register in those formulas that supplies tenK is either 1 or 0, I may try to incorporate that in cleaning up the routine.

Thanks for the interest.


Due to several small errors and the difficulty of maintaining code in several places, the "final" working code has been posted here:
Last edited:
Playing with this equation is a great way to waste time, and 3:30 PM was just not coming fast enough (Penn State v. Ohio State).
So, realizing I had changed the additions and only one of the "magic" negating numbers gets doubled, I calculated a newer set that reduced the number of cycles in normalization. That trimmed 10 Tcy from the whole process.

The new "magic" numbers are:
;ones= -20 , carry=2 (same as original)
;tens= -128, carry=13(-64x2)
;hund= -57 , carry=7
;thou= -73 , carry=8
;tenK= +8

Simply plug those into version 3.1 (Post #11) appropriately and it should work.

Interesting series of posts. I hadn't been aware of this method of binary to BCD conversion. My previous projects used the Double Dabble method which I ripped off of a Microchip appnote. So, it's interesting to see this approach. I'd be interested in seeing a speed comparison between your code and the one posted by Les.

The Double Dabble method results in packed BCD, while the polynomial method in your code has one byte per digit. So that is another consideration when choosing a method.

For completeness, there is a third common method which involves successive divisions by ten to chop off one digit at a time. I was stuck inside the house today because of miserable windy rainy weather, so I thought I'd try my hand at that approach. It's sort of working now, but I need to do more testing and clean up the code before I post it. I have no idea how efficient it is at this point.
Hi Bob,

I have not run Les' code. It is for 24 bit, so any comparison might be a bit biased against it. I did run a version of Double Dabble from Microchip's forum that was posted by 1and0 here:

Run time for a 16-bit was 344 Tcy, but that also includes doing the ascii offset. My notes indicate that the same number with the Payson polynomial was 183 Tcy. As seems to be typical, looping methods favor shorter code but longer run time. I suspect successive subtractions would follow that pattern.

Going back to the original Payson method, I think that can be tweaked a little by changing the negative offsets as I did in post #12. Here's a table showing what I did for that post:


I would look first at the offset used for the 10's, which probably affects the later ones as well.

Regards, John

NB: I was a little worried about the offset used for b2 as it is the only instance in which the absolute value of the offset is not greater than the maximum value of the digit's polynomial with carry. I evaluated several inputs to test whether that caused an error, and they all worked fine. Of course the polynomial is negative and the apparent "inadequate" compensation for carry is taken care of in b3 .
Last edited:
Thanks for the additional info. I didn't initially see the theoretical writeup at the bottom of the PIClist page, and when I tried the link to Scott Dattalo's page I got a 404 error. So I was initially confused about how this works. Then after a bit of googling I found a page by Douglas Jones describing the method:

However, he didn't make use of the negative constants, primarily because he was coding in C and was able to do the integer ÷10 and mod 10 directly. That part of the writeup doesn't provide any new info except that near the bottom of his page, he discusses the idea of alternative radices which may be of interest and could possibly give some speed improvement:

Meanwhile, I'm plugging away at my modulo 10 routine. BTW, I have an ulterior motive for this: I need a fast divide by 10 routine for a project at work. So, this is a good test bed for the algorithm.
Thanks for the additional links.

Wouldn't the one's polynominal for the BCD conversion give you modulo 10? These are the fastest routines I found, particularly if you tweak them just a little. Will you be able to post your results?

Good point. I'll have another look at the one's polynomial code.

I should be able to post my code tomorrow. I started out testing it on a spreadsheet, and then a binary math simulation program which I find easier to use when testing out basic concepts. That part is done, and it's working. I've just finished translating it into PIC code, but haven't run it yet. It's getting late. So, I'll continue tomorrow.

I'm not really expecting my code to be terribly efficient, but it will be interesting to see how it compares.
Hi Bob,

Had nothing to do, so I pretty much cut and pasted to get this for modulo 10:
     swapf     binH,w    ;w=16a2+a3
     addwf     binH,w    ;w=16a2+16a3+a2+a3 (carry in DC)
     andlw     0x0F      ;w=a3+a2 (carry in DC)
     btfsc     STATUS,1
     addlw     0x10
     addlw     0x05   
     movwf     binH      ;binH=a3+a2+5   
     swapf     binL,w
     andlw     0x0F
     addwf     binH
     movf      binL,w
     andlw     0x0F      ;w=a0
     movwf     binL       
     lslf      binH,f
     lslf      binH,w
     subwf     binL,f   
     movlw     0x0A
     addwf     binL,f      
     btfss     STATUS,0
     bra       _Mod10
     bra       Done      ;result is in binL
Number is entered in binH:L and result is in binL. I believe it can be done without the additional register using one of the swap register routines, but that will take at least 0ne more instruction. Run time for 0xFFFF was 92 Tcy, 0x000A was the shortest at 20 Tcy, and several other tests were in the mid-40 Tcy's.


Edit: Avoided need for extra register without swapping and with no more steps.
Last edited:
I got my code working and it's faster than I expected. Though yours is still significantly faster. The speed is dependent on the number of digits because it performs one iteration of the outer loop per digit. If the input value is 0, it takes 18 cycles. Otherwise, it's roughly 100 cycles per digit. A value of 99999 takes 568 cycles. Since it requires 3 registers anyway, there was no real performance hit by allowing more than 17 bit numbers. The largest number that it will handle is 655359 which is 10*2^16-1. The reason for the odd upper limit is due to some left and right shifting that goes on in the calculation. There needs to be extra bits so there's room to move. Converting 655359 takes 726 cycles.

The primary motive behind this was to test my divide by 10 algorithm. It's based on the idea that division by a fixed constant can be done by summing an infinite series. For division by ten it is:

[latex]\dfrac{n}{10}=2^{-5}\sum\limits_{k=0}^{\infty}\,\,{2^{-4k} \times 3n}[/latex]

(Dunno why that came out so big. :wideyed:)

Though it looks overly complicated, it only requires a single multiplication by 3 (1 shift and one add), and then all subsequent operations involve shifting and adding the number to itself. Since each successive term decreases by a factor of 16, no more than 4 terms are required to divide a 24 bit number to the necessary precision. The least significant bits of the division contain a fractional value (n mod 10)/32 rather than a remainder. So, that has to be extracted and then multiplied by 10 to get the decimal digit value. So, the inner (div10) loop will execute a maximum of 4 times, and the outer DigitLoop will execute a maximum of 6 times (once per decimal digit).

;  Binary to BCD conversion routine

    radix dec

;  variables
    cblock  0x20

    ;On entry the binary number is in registers y0 (LSB), y1, y2 (MSB)
    ; maximum allowable binary value is 10*2^16-1 = 655359
    clrf digit0 ;initialize the digits to zero
    clrf digit1
    clrf digit2
    clrf digit3
    clrf digit4
    clrf digit5
    movlw Digit0 ; Initialize the digit array pointer
    movwf fsr0L
    clrf fsr0H
    movf y2,w
    iorwf y1,w
; This outer loop iterates once per actual decimal digit. So, if the
; decimal result is only one digit, the loop executes only once.
; For an input value of zero it won't execute at all.
; Do While y0<>0 or y1<>0 or y2<>0
    iorwf y0,w
    btfsc status,z
    incfsz y0,f ; increment y to compensate for series truncation error
    goto Div10
    incfsz y1,f
    goto Div10
    incf y2,f
; Divide by 10 using the infinite series method
; On entry the dividend is in registers y0..y2
; On completion the quotient is in y0..y2 with y0 containing part integer and part fraction
; Registers x0, x1, x2 are temporary registers holding the 3 byte series term(i)
    lslf y0,w  ; multiply dividend by 3
    movwf x0 ;copy shifted y to x
    rlf y1,w
    movwf x1
    rlf y2,w
    movwf x2
    movf x0,w  ;add x to y 
    addwf y0,f
    movf x1,w
    addwfc y1,f
    movf x2,w
    addwfc y2,f
    lslf y0,f ;now shift y left 3 positions
    rlf y1,f
    rlf y2,f
    lslf y0,f
    rlf y1,f
    rlf y2,f
    lslf y0,f
    rlf y1,f
    rlf y2,f
    movf y0,w  ;copy y back to x
    movwf x0
    movf y1,w
    movwf x1
    movf y2,w
    movwf x2
; This inner loop never takes more than 4 iterations.
; Number of iterations is proportional to magnitude of y.
; Do While x2<>0 or x1<>0
    iorwf x1,w
    btfsc status,z
    goto Div10Done
    ;shift x0..x2 right 4 places
    lsrf x2,f
    rrf x1,f
    rrf x0,f
    lsrf x2,f
    rrf x1,f
    rrf x0,f
    lsrf x2,f
    rrf x1,f
    rrf x0,f
    lsrf x2,f
    rrf x1,f
    rrf x0,f
    movf x0,w ;add x to y
    addwf y0,f
    movf x1,w
    addwfc y1,f
    movf x2,w
    addwfc y2,f
    goto Div10Loop ; Wend
; Quotient is in y0..y2 (y0 has fractional part; y1,y2 have integer part)
;****************** End of Divide by 10 routine
; extract current raw BCD digit from y0 and multiply by 10
    lsrf y0,w
    movwf x0
    lsrf x0,w
    movwf x1
    lsrf x1,w
    addwf x0,f  ; digit(i) = high nibble of x0
    swapf x0,w
    andlw 0x0F
    movwi fsr0++ ;save digit in output register
    movf y1,w ; divide y by 2^8 for next iteration
    movwf y0
    movf y2,w
    movwf y1
    clrf y2
    goto DigitLoop  ; rinse and repeat

Edit: I revised the code, optimizing it a bit by removing a few redundant instructions. Conversion of 99999 is down to 513 cycles. Conversion of 655359 is down to 662 cycles.
Last edited:
Not open for further replies.

Latest threads

New Articles From Microcontroller Tips