Robotic vision refrence

Sceadwian · Jun 13, 2010

Can anyone point me to some equations for determining how to use a reference object (a simple cross) to determine the angle of the camera in reference to the plane to which it's attached, and how to measure other points in the same frame to correct for perspective distortion? The basic idea is for developing the equations required to make a floating camera an optical micrometer based on the cross reference itself. Mounting the camera perfectly flat at the object isn't practical as it needs to be handshooting, the cross would basically to help correct for the very subtle perspective shift I'm thinking multiple crosss, then the user would point out to the image software where the crosses were and it'd use edge detection from there to determine it's exact dimensions. Then you would point out the pixels on the other dimensions in the field to be measured to get the dimensions and perspective correction.

misterT · Jun 14, 2010

Camera calibration is not a simple task to learn, understand and implement. These slides can give you idea of the complexity:

https://www.electro-tech-online.com/custompdfs/2010/06/GeometryCalibration.pdf
https://www.electro-tech-online.com/custompdfs/2010/06/camera_calibration_fall2002.pdf

OpenCV is a free computer vision library. I believe it has some kind of calibration feature implemented.

Sceadwian · Jun 14, 2010

I mistyped when I said perspective correction =) What I'm trying to determine is simple orientation. Photograph a black cross that has very specific known dimensions and then determine from this on that plane (and that plane only) what discrete distance each pixel represents, since it will be hand shot the user will click to point out the vertical and horizontal lines of the cross and from those coordinates determine based on the known dimensions of the cross what offset for rotation and angle, it should be some pretty basic trig equations. I'm not trying to calibrate the optics per say but a single reference point on the image.

dknguyen · Jun 14, 2010

FOr some reason I can't post a reply in that thread (and it seems only that thread)...it keeps sending me to the login page no matter how many times (or how) I logged in.

In math concepts, I think you have to work backwards from the problem of
"What does a 3D graph look like when it is projected onto a 2D plane?"

1. Start with 2 planes- the plane containing your reference cross and the plane that is your camera image.
2. We start by assuming the planes are aligned and parallel. Now you are going to tilt and rotate your cross plane, and then project that image onto the camera plane.
3. If you work backwards from that problem I think that's the solution you're looking for.

Method 1:
Two ways I can think of to go about it. The first is working with an actual reference image of your cross. You can store the image of the cross as a numerical matrix like a bitmap. The transform matrixes would be three rotation transforms, one size scaling matrix to account for distance which requires knowledge of the camera's FOV, and a projection transform. Two rotations for "tilt" and one rotation for "rotation of the camera". 3-dimensional matrixes might be involved for the rotations since you are rotating the image in 3D space. It should turn to 2D again once you do the projection transform. If you can figure out the transform matrixes by searching online, you can figure out the matrix of the inverse transform and use that on the image and get your answer in one shot. I don't think finding the actual transform matrixes required is terribly difficult.

Working forward:
[2D bitmap matrix of camera image] =
[Projection Transform of 3D image on 2D surface]*
[Size scaling matrix to account for distance]*
[3D Rotation Matrix X]*[3D ROtation Matrix Y]*[3D ROtation Matrix Z]*
[2D bitmap matrix of camera image that has been "stuffed" to be a 3D matrix]

Multiplying those bolded transform matrixes gives you:
[2D bitmap matrix of camera image] =
[Resultant Transform Matrix
[2D bitmap matrix of camera image that has been "stuffed" to be a 3D matrix]

Solving for the resultant transform matrix will give you:
[Resultant Transform Matrix =
[2D bitmap matrix of camera image that has been "stuffed" to be a 3D matrix]^-1*
[2D bitmap matrix of camera image]

"^-1" represents means the inverse of that matrix so you're going to have to find the inverse of the camera images. You can order the XYZ rotation matrixes any way you like, but the order in which rotation are applied matters so you have to keep things consistent. You might also use quaternions instead of rotation matrixes.

From here, the LHS will be a matrix full or coefficients and variables representing your XYZ rotation angles and distance. The RHS will be a simple number matrix. Turn matrix equality back into multiple simultaneous equations and solve for the those parameters. It might be really difficult or impossible to do analyitcally though. You might have to sit down with a pen and paper for a long time. Or you could try use math software to find an analytical solution. When all else fails, you can solve for it numerically.

To deal with image noise you might try two things. The first is storing the actual reference cross image as a bitmap matrix and passing your camera image into a bitmap matrix and then de-noising it there with transforms like gaussian blurring or whatever. But personally, I think it would be better if you just worked with the two thin lines that make up the conceptual cross rather than the fat lines that will appear in the image. So the reference cross bitmap matrix could just be a bunch of zeroes with just two line of ones. The camera image of the cross would have to be averaged and thresholded to reduce the cross down to two thin lines to try and match the image.

So at the end of all this you're left with rotation and distance parameters that might be slightly off due to image noise since the reference matrix and camera matrix aren't perfect transforms of each other. I think it's pretty elegant. But an alternative approach that is analytically easier is an iterative approach that uses correlation to check your final answer. Start with assumption of distance and rotations and proceed to transform the reference cross image and compare it with the camera image to see if it matches. Then proceed with guessing the distance and rotations intelligently and calculating subsequent transforms of the reference image. Each time correlate them with the actual camera image until it matches "enough" and then you have your distance and angles. The advantage of this method is you need no solving of simultaneous equations. Disadvantage is how to get your iterative estimation algorithm to converge. That's probably the hardest part of this approach. But you have bounds on the distance and rotations involved which should help considerably. You could constantly calculate the results at the boundaries to help your iterations go in the right direction. It might take many many calculations, but it seems to me that it might not matter to you in this case. You definately want MATLAB's help with the one-shot approach though (the most important being that I think it can also solve those simultaneous equations for you if you use the symbolic variable toolbox. In the same way, it can also help you find the resultant transform matrix).

Method 2:
The second way is to have a conceptual model of the image of the cross rather than actually having an image of it. Treat the cross as two equations represnting perpindicular lines. Performing a dot product of these equations onto the equation representing the camera plane will give you the equations of what the lines should look like. Reversing the dot product will give you TWO angles representing the relative tilt between the cross plane and camera plane. The third angle (what I have been calling the rotation of the camera) would have to be figured out everything I just said could be pulled off. Of course, you could just guess the rotation before performing the dot product and comparing it with what the camera sees, incrementing the rotation before each iterative calculation os the reverse dot product. Eventually you would end up with a rotation that gave you what the camera sees, as well as your two tilt angles. I haven't dealt with scaling image size for distance at all yet. But I'm sure you know it would involve the camera's FOV. Method 2 isn't a problem since you can always do it after you've figured out the angles because it treats the lines as infinitely long.

EDIT:
Seriously, I can't reply to this thread either. I can only edit the initial post. Remember when I said you definately want MATLAB? Yeah I was wrong. You definately need it for any approach you take that involves bitmap matrixes. Since each dimension of the matrix will be however many pixels you decide to have in your reference or camera image prior to the transform. Even a 2x2x2 matrix is weird to solve by hand, let alone one involving variables. Yours are probably going to be at least what...50 pixels? so 50x50x50 matrix? You're going to need math software that can handle symbolic matrixes and solve simultaneous equations analytically for you.

misterT · Jun 14, 2010

Take a look at homography (mapping of a plane in the world to a image plane):
https://www.electro-tech-online.com/custompdfs/2010/06/tutorial2.pdf

I can answer questions about the math, but be specific when asking.

Sceadwian · Jun 15, 2010

I tabbed through it real fast, looks like I should be able to work out what I want from there. If I have trouble with the math later I'll give ya a shout, thanks.

Welcome to our site!

Electro Tech is an online community (with over 170,000 members) who enjoy talking about and building electronic circuits, projects and gadgets. To participate you need to register. Registration is free. Click here to register now.

Robotic vision refrence

Sceadwian

Banned

misterT

Well-Known Member

Sceadwian

Banned

dknguyen

Well-Known Member

misterT

Well-Known Member

Sceadwian

Banned

Similar threads

Latest threads

New Articles From Microcontroller Tips