bits 'game'

electroRF · Jan 10, 2014

Hi,
I'm wondering about the following question:

assume W,X,Y,Z are 8-bit registers.

Given the following expression:

C:

(((W-X)>>3) + Y-Z+4)<<2

how many bits are needed to prevent overflow?

I'm wondering what would be a 'neat' way to solve it?

* it isn't mentioned in the question, but I assumed the registers are signed 8-bit numbers, because I think that gives the maximum number of bits required to avoid overflow.

Ratchit · Jan 10, 2014

electroRF,

"how many bits are needed to prevent overflow?"

No matter how many bits a register has, an overflow can still happen.

"I'm wondering what would be a 'neat' way to solve it?"

You have to define the problem in a way that makes sense first.

"* it isn't mentioned in the question, but I assumed the registers are signed 8-bit numbers, because I think that gives the maximum number of bits required to avoid overflow. "

What you say above does not make sense. For addition and subtraction, signed and unsigned are the same. The results depend on how the result is intrepreted. The ADD and SUB instructions are the same for either type of number.

Ratch

electroRF · Jan 11, 2014

Hi Ratchit,
The question was about how many bits should the result have to avoid overflow.

it DOES matter whether the registers are signed or unsigned.

if W,X are unsigned, then W-X will never require more than 8 bits for the result.
if W,X are signed, then W-X may require 9 bits for the result:
say W = 2^7 - 1 , ans X = - 2^7
Then W-X = 2^8 - 1, which is represented in 9 bits for signed numbers.

Ratchit · Jan 11, 2014

electroRF,

"The question was about how many bits should the result have to avoid overflow."

And the answer is still that an overflow can occur no matter what the size of the register is.

"it DOES matter whether the registers are signed or unsigned"

The hardware of every arithmetic register is the same. There are no signed or unsigned registers. Whether a number is signed or not depends on how it is interpreted. For instance, an 8-bit register whose contents are 0FFh can be interpreted as; 255 for unsigned, -0 in 1's complement, or -1 in 2's complement. Try doing the computations on paper, and see that adding and subtracting numbers are the same regardless whether they are interpreted as signed or unsigned.

I think you have to study what overflow and carryout means to answer your next questions.

Ratch

misterT · Jan 12, 2014

Ratchit said:
No matter how many bits a register has, an overflow can still happen.

You need to apply some common sense and logical thinking when you have a question like this. The description of the problem says:
"assume W,X,Y,Z are 8-bit registers."

This means that the range of these variables are [0, 255] for unsigned values and [-128, 127] for signed values.

If you add two of these values using 32 bit registers, you will never have an overflow.

Now the question is: What is the minimum size working register so that overflow does not happen when you calculate:
(((W-X)>>3)+ Y-Z+4)<<2

You need to assume that you first move the 8-bit variables from their original 8-bit register to the working register/registers.

Ratchit · Jan 12, 2014

misterT,

"You need to apply some common sense and logical thinking when you have a question like this. The description of the problem says:
"assume W,X,Y,Z are 8-bit registers.""

What you really mean to say is that you have to intrepret what the OP meant.

"This means that the range of these variables are [0, 255] for unsigned values and [-128, 127] for signed values."

Yes, that is correct for 2's complement representation.

"If you add two of these values using 32 bit registers, you will never have an overflow"

Yes, that is correct also.

"Now the question is: What is the minimum size working register so that overflow does not happen when you calculate:
(((W-X)>>3)+ Y-Z+4)<<2"

The term "working register" is not mentioned by the OP. Why should I interpret it that way? Also, does the ">>" symbol mean arithmetic shift or logical shift?

"You need to assume that you first move the 8-bit variables from their original 8-bit register to the working register/registers."

No, if you move them to a larger register, then the overflow will not happen. You need to work the the worst case with pencil and paper.

Ratch

NorthGuy · Jan 12, 2014

Assuming that the result is going to be stored in 8-bit register, you can select values of W, X, Y and Z (e.g. Y = 127 and the rest is zero) such than an overflow occurs no matter what.

If you're asked such question on the interview ... run.

Ratchit · Jan 12, 2014

NorthGuy,

I think he wants to know how many bits he needs to do the operation (((W-X)>>3) + Y-Z+4)<<2 using the worst case maximum values for 8-bit 2's complement numbers.

One bit is used for the sign, so that leaves 7-bits for the number. The positive number that will fill all the 7-bits are 127 decimal=7F hex. And -127 decimal = 81 hex.

If W = 7F and X = 81 hex we get W-X = 1FE. Shifting 3 bits right we get 3F. Adding Y =7F gives 46 hex. Subtracting Z = 81 hex gives 1C5. Adding 4 gives 1C9 hex. Shifting 2 bits left gives 724 hex which takes 11 bits to represent plus 1 sign bit for a total of 12 bits.

Ratch

NorthGuy · Jan 12, 2014

Minumum 8-bit signed number is -128 (0x80).

misterT · Jan 12, 2014

Ratchit said:
What you really mean to say is that you have to intrepret what the OP meant.

Yes, and I rather interpret it so that it makes sense. Not like some sleazy lawyer trying to find something he can use against others.

misterT · Jan 12, 2014

Let me show you how you can find the solution step by step. Take this part of the equation for example:

(W-X)
Those are 8-bit variables, so the maximum of that equation is:

255-(-128) = 383
You need 9 bits to store that (10 with 2's complement).

The minimum is:
-128-255 = -383
You need 10 bits to store that in 2's complent.

So, you need at least 10 bit register to avoid overflow.

NorthGuy · Jan 12, 2014

Assuming all signed variables and arithmetic shifts, we can set all variables to their maximums(127) and minimums(-128) in such a way that the end result is maximuzed. It'll give us 1163. If we set them all so that the end result is minimized, you get -1131. To represent numbers between -1131 to 1163, you'll need 12-bit (-2048 to 2047). 11 bits (-1024 to 1023) will not do it.

If you assume unsigned variables then you cannot prevent overflow (going below zero) with any amount of bits.

This all could make some sense if an intermediate result would require more bits than the end result, but this is not the case. This tells us that the person who created this task is lacking imagination. If he asks such question, he cannot offer anything but a very boring job.

Ratchit · Jan 12, 2014

misterT,

"255-(-128) = 383
You need 9 bits to store that (10 with 2's complement)."

You cannot use 255 for a 8-bit signed number. 127 is the maximum positive value for a signed 8-bit number. And you must use signed number representation to activate the overflow flag.

Ratch

misterT · Jan 12, 2014

Ratchit said:
misterT,

"255-(-128) = 383
You need 9 bits to store that (10 with 2's complement)."

You cannot use 255 for a 8-bit signed number. 127 is the maximum positive value for a signed 8-bit number. And you must use signed number representation to activate the overflow flag.

Ratch

You must realize that my example was the "absolute worst case" where both signed and unsigned can be used. It is perfectly fine. In that case both need to be promoted to 10 bit 2's complement and then the operation is executed.
Have you ever seen a C compiler giving an error or a warning if you try to add or subtract uint8_t with int8_t?? No, because it is perfectly fine.

Ratchit · Jan 12, 2014

misterT said:
You must realize that my example was the "absolute worst case" where both signed and unsigned can be used. It is perfectly fine. In that case both need to be promoted to 10 bit 2's complement and then the operation is executed.
Have you ever seen a C compiler giving an error or a warning if you try to add or subtract uint8_t with int8_t?? No, because it is perfectly fine.

No, it is not perfectly fine. 255 unsigned represents a negative signed number for 8 bits. Subtracting signed -128 will guarantee that overflow will never happen. But, overflow will will happen if 127 -(-128).

Ratch

Ratchit · Jan 12, 2014

NorthGuy,

"Minumum 8-bit signed number is -128 (0x80). "

Yes, you are correct. My mistake.

I guess you agree with me that 12 bits are needed.

Ratch

misterT · Jan 12, 2014

Ratchit said:
No, it is not perfectly fine. 255 unsigned represents a negative signed number for 8 bits. Subtracting signed -128 will guarantee that overflow will never happen. But, overflow will will happen if 127 -(-128).

Ratch

I know you are a smart guy and you know exactly what I mean with my previous post.. so just stop trolling.
It is perfectly fine to mix different datatypes in calculations.. There are even specific assembly instructions to "multiply signed with unsigned" etc.
You only need to make sure that the register where you store the result is proper size and type.

Ratchit · Jan 12, 2014

misterT,

255 unsigned represents a negative signed number in 8-bits. Subtracting -128 is the same adding +128. Addition of two numbers with unlike signs will never be an overflow. Multiplication need different CPU instructions for signed or unsigned multiply. Integer addition and subtraction do not.

Ratch

misterT · Jan 12, 2014

Ratchit said:
Multiplication need different CPU instructions for signed or unsigned multiply. Integer addition and subtraction do not.

Exactly.. Signed and unsigned do not need any special CPU instructions. So there is no problem. Just place the 255 in the (bigger) result register (this is called implicit casting and promoting, which happens all the time) and then add/subtract the other variable from it. No problem.
**broken link removed**

Ratchit · Jan 12, 2014

misterT said:
Exactly.. Signed and unsigned do not need any special CPU instructions. So there is no problem. Just place the 255 in the (bigger) result register (this is called implicit casting and promoting, which happens all the time) and then add/subtract the other variable from it. No problem.
**broken link removed**

I program in assembler, not C, therefore that link does not mean anything to me. However, the C compiler uses the native word size which is usually larger than 8-bits. I still stand by what I said about 255 being a negative number in signed 8-bit words. Perhaps the compiler expands the 8-bit word to a larger size if if the value does not fit into 8-bits. If it does, that goes beyond the problem we are discussing.

Ratch

Welcome to our site!

Electro Tech is an online community (with over 170,000 members) who enjoy talking about and building electronic circuits, projects and gadgets. To participate you need to register. Registration is free. Click here to register now.

bits 'game'

Member

Well-Known Member

Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Similar threads

New Articles From Microcontroller Tips