FOr some reason I can't post a reply in that thread (and it seems only that thread)...it keeps sending me to the login page no matter how many times (or how) I logged in.
In math concepts, I think you have to work backwards from the problem of
"What does a 3D graph look like when it is projected onto a 2D plane?"
1. Start with 2 planes- the plane containing your reference cross and the plane that is your camera image.
2. We start by assuming the planes are aligned and parallel. Now you are going to tilt and rotate your cross plane, and then project that image onto the camera plane.
3. If you work backwards from that problem I think that's the solution you're looking for.
Method 1:
Two ways I can think of to go about it. The first is working with an actual reference image of your cross. You can store the image of the cross as a numerical matrix like a bitmap. The transform matrixes would be three rotation transforms, one size scaling matrix to account for distance which requires knowledge of the camera's FOV, and a projection transform. Two rotations for "tilt" and one rotation for "rotation of the camera". 3-dimensional matrixes might be involved for the rotations since you are rotating the image in 3D space. It should turn to 2D again once you do the projection transform. If you can figure out the transform matrixes by searching online, you can figure out the matrix of the inverse transform and use that on the image and get your answer in one shot. I don't think finding the actual transform matrixes required is terribly difficult.
Working forward:
[2D bitmap matrix of camera image] =
[Projection Transform of 3D image on 2D surface]*
[Size scaling matrix to account for distance]*
[3D Rotation Matrix X]*[3D ROtation Matrix Y]*[3D ROtation Matrix Z]*
[2D bitmap matrix of camera image that has been "stuffed" to be a 3D matrix]
Multiplying those bolded transform matrixes gives you:
[2D bitmap matrix of camera image] =
[Resultant Transform Matrix
[2D bitmap matrix of camera image that has been "stuffed" to be a 3D matrix]
Solving for the resultant transform matrix will give you:
[Resultant Transform Matrix =
[2D bitmap matrix of camera image that has been "stuffed" to be a 3D matrix]^-1*
[2D bitmap matrix of camera image]
"^-1" represents means the inverse of that matrix so you're going to have to find the inverse of the camera images. You can order the XYZ rotation matrixes any way you like, but the order in which rotation are applied matters so you have to keep things consistent. You might also use quaternions instead of rotation matrixes.
From here, the LHS will be a matrix full or coefficients and variables representing your XYZ rotation angles and distance. The RHS will be a simple number matrix. Turn matrix equality back into multiple simultaneous equations and solve for the those parameters. It might be really difficult or impossible to do analyitcally though. You might have to sit down with a pen and paper for a long time. Or you could try use math software to find an analytical solution. When all else fails, you can solve for it numerically.
To deal with image noise you might try two things. The first is storing the actual reference cross image as a bitmap matrix and passing your camera image into a bitmap matrix and then de-noising it there with transforms like gaussian blurring or whatever. But personally, I think it would be better if you just worked with the two thin lines that make up the conceptual cross rather than the fat lines that will appear in the image. So the reference cross bitmap matrix could just be a bunch of zeroes with just two line of ones. The camera image of the cross would have to be averaged and thresholded to reduce the cross down to two thin lines to try and match the image.
So at the end of all this you're left with rotation and distance parameters that might be slightly off due to image noise since the reference matrix and camera matrix aren't perfect transforms of each other. I think it's pretty elegant. But an alternative approach that is analytically easier is an iterative approach that uses correlation to check your final answer. Start with assumption of distance and rotations and proceed to transform the reference cross image and compare it with the camera image to see if it matches. Then proceed with guessing the distance and rotations intelligently and calculating subsequent transforms of the reference image. Each time correlate them with the actual camera image until it matches "enough" and then you have your distance and angles. The advantage of this method is you need no solving of simultaneous equations. Disadvantage is how to get your iterative estimation algorithm to converge. That's probably the hardest part of this approach. But you have bounds on the distance and rotations involved which should help considerably. You could constantly calculate the results at the boundaries to help your iterations go in the right direction. It might take many many calculations, but it seems to me that it might not matter to you in this case. You definately want MATLAB's help with the one-shot approach though (the most important being that I think it can also solve those simultaneous equations for you if you use the symbolic variable toolbox. In the same way, it can also help you find the resultant transform matrix).
Method 2:
The second way is to have a conceptual model of the image of the cross rather than actually having an image of it. Treat the cross as two equations represnting perpindicular lines. Performing a dot product of these equations onto the equation representing the camera plane will give you the equations of what the lines should look like. Reversing the dot product will give you TWO angles representing the relative tilt between the cross plane and camera plane. The third angle (what I have been calling the rotation of the camera) would have to be figured out everything I just said could be pulled off. Of course, you could just guess the rotation before performing the dot product and comparing it with what the camera sees, incrementing the rotation before each iterative calculation os the reverse dot product. Eventually you would end up with a rotation that gave you what the camera sees, as well as your two tilt angles. I haven't dealt with scaling image size for distance at all yet. But I'm sure you know it would involve the camera's FOV. Method 2 isn't a problem since you can always do it after you've figured out the angles because it treats the lines as infinitely long.
EDIT:
Seriously, I can't reply to this thread either. I can only edit the initial post. Remember when I said you definately want MATLAB? Yeah I was wrong. You definately need it for any approach you take that involves bitmap matrixes. Since each dimension of the matrix will be however many pixels you decide to have in your reference or camera image prior to the transform. Even a 2x2x2 matrix is weird to solve by hand, let alone one involving variables. Yours are probably going to be at least what...50 pixels? so 50x50x50 matrix? You're going to need math software that can handle symbolic matrixes and solve simultaneous equations analytically for you.