Continue to Site

Welcome to our site!

Electro Tech is an online community (with over 170,000 members) who enjoy talking about and building electronic circuits, projects and gadgets. To participate you need to register. Registration is free. Click here to register now.

  • Welcome to our site! Electro Tech is an online community (with over 170,000 members) who enjoy talking about and building electronic circuits, projects and gadgets. To participate you need to register. Registration is free. Click here to register now.

Failure modes in microcontrollers? Industrial encoder feedback board glitchy

Status
Not open for further replies.

fastline

Member
I know this is a stretch, especially with the limited data available. I have been trying to find a problem with a rotary encoder feedback card in a cnc control system. I will explain its function. The problem board receives position data from the rotary encoder, which is coupled to a servo motor. The encoder is received and relayed to a trajectory computer via an optical link. The trajectory computer compares the positional data from the feedback card to what it as commanded the servo amplifier to do, and makes constant command updates to control motion.

The feeback card is basically the middle link between the trajectory computer and the encoder, and also sends power commands to the servo amplifier.

What is frustrating is the board is NOT 100% bad! Once everything is powered up, the entire system is without fault and returned correct position data to the trajectory computer. When motion is commanded, some times it will move around for a minute or two without issue, then just fault out with a "feedback error" to the trajectory computer. However, I have closely monitored the position read directly at the trajectory computer expecting to see a jump or anomaly in position but that seems very stable and repeatable.

there are further feedback test routines in the trajectory computer and all return error free.

I remain confident this board is the suspect as there is a board for each motion axis and if we move that card to another axis, which will use a completely different servo/encoder/amp, the problem follows the feedback card!

Among the rack of feedback cards, there is a common optical card which is responsible for converting signal for optical link to the trajectory computer. I would almost look at it as well, but unless there is somehow a optical assignment ID (?) which would follow the card, I would think we would see other errors as well. There is also an optical link test routine and that returns no errors.


Now.....the feedback card is a very complex device, wiht 3 microcontrollers, and is a tri-level board. Making repair VERY difficult. I am curious, would it even be conceivable that a uC is glitchy or do they usually simply fail with an obvious failure mode?

I have combed the board for hours trying to find a damaged trace, obvious issue with a passive, and nothing is standing out. I can even compare to an identical known good card and all seems to test the same.

I would almost be thinking there is dirty power somewhere but all caps to ground test good, and testing what I can in the system, all AC ripple is down in the 1-3mV range.

Would anyone have any advice at all to comb this circuit? Loaded with optos, uCs, DACs, and ADCs. And I realize this is about the time to send it somewhere but there is NO ONE that works on them! So we have a very expensive machine sitting idle over this! I just want something to let the magic smoke out so we can trace it and repair it! LOL
 
However, I have closely monitored the position read directly at the trajectory computer expecting to see a jump or anomaly in position but that seems very stable and repeatable.

So you mean it's stable and repeatable right up until you get a feedback error? I assume the feedback error shuts down and overrides everything so you can't get a position read?

Does a feedback error mean that the trajectory computer is unable to communicate with the card? Or does it mean that the card thinks the encoder gave it garbage?

Is the fault reproducible with the same sequence of movements every time from cold boot? Or similar sequence of movements? If it is, it could be some of the program memory has become corrupted. Can you program the CNC axis to do nothing but just move back and forth repeatedly across its whole range? Or a different portion of it's range? Does it take just as long to fail? If it fails at all? If there's some correlation then it might be an issue with some corruption in part of the of the firmware that is not entered frequently. This assumes the firmware is deterministic which it might not be.

Yes, it's conceivable a microcontroller could be glitchy. That's why TI sells microcontrollers with two cores that can run in lock-step and detect hard and soft faults. I've seen chips that just have one dead pin as well as chips where the non-volatile memory somehow got corrupted where a re-flash seemed to fix the problem...

What do you mean by tri-level board?
 
Last edited:
I remain confident this board is the suspect as there is a board for each motion axis and if we move that card to another axis, which will use a completely different servo/encoder/amp, the problem follows the feedback card!
It appears that you've traced the problem to this board, unless there are other things that also move when you move the board. Is it really worth the time to trace out and fix the board, or should you just replace the whole board? How expensive are they?

Digital IC's are very reliable, but there's no such thing as a 100% reliable part. I've had at least one microcontroller partially fail due to a voltage transient or static discharge on one of its I/O pins. Its program still runs correctly, and the rest of the I/O is okay, but that one pin now has its state permanently set to a one or zero. Optocouplers are also known to be problematic.
 
https://www.analog.com/media/en/technical-documentation/data-sheets/AD7892.pdf

OK, the errors typically ONLY happen during motion. The returned position at the control has never appeared out of whack at all. Always seems to be correct.

The very specific error from the card seems to be "ADC synchronization error".

I listed to above link because those are ADC chips, and in looking at other good boards, it appears they have had certain resistors replaced, specifically series resistors that fee Vin1 and Vin2 in parallel. There are two of these chips setup up the same. Other boards show these resistors at the same value. The bad one showed one chip to have an extra 1K of resistance and these are 1% resistors. I pulled the resistor and confirmed it is a 15.8kohm resistor but read 16.83kohm.

These ADC are fed at Vin1/Vin2 through these resistors and back to optos.

I don't know what to make of the "sync" error but it seems more like a buffer or lag, or delay issue?

When operating, it almost seems like heat makes it worse. When I changed the bad resistor above, I really thought I found the problem. It ran for 2min by manually jogging the machine around back and forth. No errors at all.......Then boom. Each time I reset the control and try again. It will move around another 5-10sec, and fault again. Sometimes you reset and it will fault immediately. Seems to be pretty random.

We cannot buy these cards!!!! We are really screwed if we can;t figure out a way to fix them. Like time to consider a whole new control system!
 
https://www.analog.com/media/en/technical-documentation/data-sheets/AD7892.pdf

OK, the errors typically ONLY happen during motion. The returned position at the control has never appeared out of whack at all. Always seems to be correct.

The very specific error from the card seems to be "ADC synchronization error".

I listed to above link because those are ADC chips, and in looking at other good boards, it appears they have had certain resistors replaced, specifically series resistors that fee Vin1 and Vin2 in parallel. There are two of these chips setup up the same. Other boards show these resistors at the same value. The bad one showed one chip to have an extra 1K of resistance and these are 1% resistors. I pulled the resistor and confirmed it is a 15.8kohm resistor but read 16.83kohm.

These ADC are fed at Vin1/Vin2 through these resistors and back to optos.

I don't know what to make of the "sync" error but it seems more like a buffer or lag, or delay issue?

When operating, it almost seems like heat makes it worse. When I changed the bad resistor above, I really thought I found the problem. It ran for 2min by manually jogging the machine around back and forth. No errors at all.......Then boom. Each time I reset the control and try again. It will move around another 5-10sec, and fault again. Sometimes you reset and it will fault immediately. Seems to be pretty random.

We cannot buy these cards!!!! We are really screwed if we can;t figure out a way to fix them. Like time to consider a whole new control system!

Elaborate what you mean by tri-level board. Does this mean the unit can split up into three different PCBs?

You could try replacing the ADC. Low effort

EDIT: Or not, man those ADCs are unexpectedly expensive. I don't see anything special about them.
 
I use to work in a factory where I did electronic repair on all the electronics in the building. We had 2 machines with encoders the fastest way to find and fix the problem to keep the assembly line moving was to swap new parts for used parts. To make a long story short the encoder was always bad. The problem is not always the electronics often there was a speck of micro small dust on the encoder disk. Even though the encoders are sealed air tight they would some how get specks of dust on the disc. Blow the disk with a can of clean air then test it. Spray the glass disc with 100% alcohol then clean air then test it.

Next thing to do is get data sheets on all the circuit board parts and starting testing each part 1 by 1. Start at the power supply and read the voltages on the board through all the parts 1 by 1. It does not take long to see if each part works like the data sheet says it should. If you have a signal to an IC but no signal coming out of the IC, is the IC bad or is 1 of the resistors, capacitors, other parts on that IC bad. If a part is not working then you need to find out why. Look for over heated burned resistors. Test or swap capacitors.
 
Last edited:
Even though the encoders are sealed air tight they would some how get specks of dust on the disc.
.

Having had similar, if not identical problems, my theory is that the micro-dust was trapped from the time the encoder was manufactured.
It may have been lodged in a corner or something, then it became loose only to get adhered to the optical wheel.

I have this theory this because I've tested the hermeticity of faulty encoders by dunking them in dyed alcohol. I've then opened the encoder and have seen no evidence of dye inside the encoder.
 
I use to work in a factory where I did electronic repair on all the electronics in the building. We had 2 machines with encoders the fastest way to find and fix the problem to keep the assembly line moving was to swap new parts for used parts. To make a long story short the encoder was always bad. The problem is not always the electronics often there was a speck of micro small dust on the encoder disk. Even though the encoders are sealed air tight they would some how get specks of dust on the disc. Blow the disk with a can of clean air then test it. Spray the glass disc with 100% alcohol then clean air then test it.

Next thing to do is get data sheets on all the circuit board parts and starting testing each part 1 by 1. Start at the power supply and read the voltages on the board through all the parts 1 by 1. It does not take long to see if each part works like the data sheet says it should. If you have a signal to an IC but no signal coming out of the IC, is the IC bad or is 1 of the resistors, capacitors, other parts on that IC bad. If a part is not working then you need to find out why. Look for over heated burned resistors. Test or swap capacitors.
Having had similar, if not identical problems, my theory is that the micro-dust was trapped from the time the encoder was manufactured.
It may have been lodged in a corner or something, then it became loose only to get adhered to the optical wheel.

I have this theory this because I've tested the hermeticity of faulty encoders by dunking them in dyed alcohol. I've then opened the encoder and have seen no evidence of dye inside the encoder.
Guys, guys....did you read his post carefully? He said he's moved the problem board to other servos and encoders on other axis and it follows the board around. He also said he's run other boards on the original servo/encoder and it works just fine. Unless I'm misinterpreting things of course.
 
You got it! We move the board around, and the problems follow the board. We either get a "ADC synchronization error" or "tracking error". Both seem closely related to me.

Regarding testing. Obviously if we had all the data, this job would get easier. There is no existing PCB diagram, circuit diagram, test points, test procedures, etc. We are still looking for ANYONE that knows similar systems that can take a look at it. We have to realize that if we cannot find someone to repair these, the control is worthless.

I am pretty surprised I cannot find a simple issue in the board but it seems these things have problems and have seen more factory "repairs" or PCB repairs in general on all these boards than I have ever seen on any other industrial control. I think this was a beta that never made alpha grade! lol
 
Might be a long shot, but could you check the board with an IR camera whilst in use, or use some IPA on the PCB and look for rapid evaporation around the time the sync error happens? As you said previously, the board tends to fail initially after a couple of minutes, then subsequently after 5 to 10 seconds after resetting. Kinda sounds like it may be some heat-related failure. Does spraying the board with canned air prolong the time of the failure?
 
I remember swapping encoders there is something special that needs to be done for it to work, that was 1985 I have forgotten more than I ever knew. You can't remove an encoder then replace it with a new encoder timing needs to be set some how, I don't remember. All I can remember is the encoder generates numbers, the computer reads the numbers and knows where movement should be. I remember turning the encoder by hand until it reached numbers the computer was looking for then putting the drive belt on the encoder. If I remember correctly the machine needs to return to the start position before setting the encoder.

The machine I was working on way back then had a sensitivity adjustment inside the electrical cabinet. It needed to be set less sensitive to get the machine started once it was running and the machine was in time with the encoder sensitive could be returned to where it was before, only while the machine is running.
 
Last edited:
Well there are absolute and incremental encoders. One just generates constantly increasing/decreasing pulses, the other has a set number of pulses from A/B channels, and the Z channel tach. So a 1000line encoder will produce 4000pulses/rev plus a tach return per rev. When homed out, the axis will walk until it taps a limit switch, then walk away from the switch to a predefined pulse within 1rev away from the switch, then set all revs to 0 and start counting.....

I know plenty about machine mechanics and basic electrics, but all be damned if I can figure this axis card out! Especially when the control is displaying consistent/accurate position data.
 
I'm assuming the chips aren't socketed. Is the board in a good enough condition that desoldering is possible without damaging it? Is freezing components a way to eliminate problems with heat.

Mike.
 
Status
Not open for further replies.

Latest threads

New Articles From Microcontroller Tips

Back
Top