Homogeneous points whose last element is w~=0 are called ideal points or points at infinity. These points can’t be represented with inhomogeneous coordinates.
2D lines
2D lines can also be expressed using homogeneous coordinates l~=(a,b,c)⊤:
{xˉ∣∣∣l~⊤xˉ=0}⇔{x,y∣∣∣ax+by+c=0}
We can normalize l~ so that l~=(nx,ny,d)⊤=(n,d)⊤ with ∥n∥2=1. In this case, n is the normal vector perpendicular to the line and d is its distance to the origin.
An exception is the line at infinityl~∞=(0,0,1)⊤ which passes through all ideal points.
Cross Product
Cross product expressed as the product of a skew-symmetric matrix and a vector:
In homogeneous coordinates, the intersection of two lines is given by:
x~=l~1×l~2
Similarly, the line joining two points can be compactly written as:
l~=xˉ1×xˉ2
The symbol × denotes the cross product.
2D Conics
More complex algebraic objects can be represented using polynomial homogeneous equations. For example, conic sections (arising as the intersection of a plane and a3D cone) can be written using quadric equations:
{xˉ∣∣∣xˉ⊤Qxˉ=0}
3D Points
3D points can be written in inhomogeneous coordinates as
x=⎝⎛xyz⎠⎞∈R3
or in homogeneous coordinates as
x~=⎝⎜⎜⎜⎛x~y~z~w~⎠⎟⎟⎟⎞∈P3
with Projective spaceP3=R4/(0,0,0,0).
3D Planes
3D planes can also be represented as homogeneous coordinates m~=(a,b,c,d)⊤:
{xˉ∣∣∣m~⊤xˉ=0}⇔{x,y,z∣∣∣ax+by+cz+d=0}
Again, we can normalizem~ so that m~=(nx,ny,nz,d)⊤=(n,d)⊤ with ∥n∥2=1. In this case, n is the normal perpendicular to the plane and d is its distance to the origin.
An exception is the plane at infinitym~∞=(0,0,0,1)⊤ which passes through all ideal points (=points at infinity) for which w~=0.
3D lines
3D lines are less elegant than either 2D lines or 3D planes. One possible representation is to express points on a line as a linear combination of two points p and q on the line:
{x∣∣∣x=(1−λ)p+λq∧λ∈R}
However, this representation uses 6 parameters for 4 degrees of freedom.
Alternative minimal representations are the two-plane parameterization or Pluecker coordinates. See Szeliski, Chapter 2.1.
3D Quadrics
The 3D analog of 2D conics is a quadric surface:
{xˉ∣∣∣xˉ⊤Qxˉ=0}
Useful in the study of multi-view geometry. Also serves as useful modeling primitives (spheres, ellipsoids, cylinders).
2D Transformations
Translation: (2D Translation of the Input, 2 DF)
x′=x+t⇔xˉ′=[I0⊤t1]xˉ
Using homogeneous representations allows to chain/invert transformations
Augmented vectors xˉ can always be replaced by general homogeneous ones x~
Euclidean: (2D Translation + 2D Rotation, 3 DF)
x′=Rx+t⇔xˉ′=[R0⊤t1]xˉ
R∈SO(2) is a rotation matrix and s is an arbitrary scale factor
The similarity transform preserves angles between lines
Affine: (2D Linear Transformation, 6 DF)
x′=Ax+t⇔xˉ′=[A0⊤t1]xˉ
A∈R2×2 is an arbitrary 2×2 matrix
Parallel lines remain parallel under affine transformations
Perspective: (Homography, 8DF)
x~′=H~x~(xˉ=w~1x~)
H~∈R3×3 is an arbitrary homogeneous 3×3 matrix (specified up to scale)
Thus, the action of a projective transformation on a co-vector such as a 2D line or 3D normal can be represented by the transposed inverse of the matrix.
Overview of 2D Transformation
Transformation
Matrix
DF
Preserves
translation
[It]2×3
2
orientation
rigid
[Rt]2×3
3
lengths
similarity
[sRt]2×3
4
angles
affine
[A]2×3
6
parallelism
projective
[H~]3×3
8
straight lines
Transformations form nested set of groups
Interpret as restricted 3×3 matrices operating on 2D homogeneous coordinates
Transformations preserve properties below
Overview of 3D Transformation
Transformation
Matrix
DF
Preserves
translation
[It]3×4
3
orientation
rigid
[Rt]3×4
6
lengths
similarity
[sRt]3×4
7
angles
affine
[A]3×4
12
parallelism
projective
[H~]4×4
15
straight lines
3D transformations are defined analogously to 2D transformations
3×4 matrices are extended with a fourth [0⊤1] row for homogeneous transforms
Transformations preserve properties below
Direct Linear Transform for Homography Estimation
Q: How can we estimate a homography from a set of 2D correspondences?
Let X={xi~,xi′~}i=1N denote a set of N 2D-to-2D correspondences related by xi′~=H~xi~. As the correspondence vectors are homogeneous, they have the same direction but differ in magnitude. Thus, the equation above can be expressed asxi′~×H~xi~=0.
Using hk⊤~ to denote the k’th row of H~, this can be rewritten as a linear equation in h~:
Each point correspondence yields two equations. Stacking all equations int a 2N×9 dimensional matrix A leads to the following constrained least squares problem:
where we have fixed ∥h~∥22=1 as H~ is homogeneous (i.e., defined only up to scale) and the trivial solution to h~=0 is not of interest. The solution to the above optimization problem is the singular vector corresponding to the smallest singular value of A (i.e., the last column of V when decomposing A=UDV⊤, see also Deep Learning lecture 11.2). The resulting algorithm is called Direct Linear Transformation.
1.2 Geometric Image Formation
Origins of the Pinhole Camera
In a physical pinhole camera the image is projected up-side down onto the image plane which is located behind the focal point
When modeling perspective projection, we assume the image plane in front
Both models are equivalent, with appropriate change of image coordinates
Projection Models
Orthographic Projection
Perspective Projection
Orthographic Projection
Orthographic projection of a 3D point xc∈R3 to pixel coordinates xs∈R3
The x and y axes of the camera and image coordinate systems are shared
Light rays are parallel to the z-coordinate of the camera coordinate system
During projection, the z-coordinate is dropped, x and y remain the same
An orthographic projection simply drops the z component of the 3D point in camera coordinates xc to obtain the corresponding 2D point on the image plane (=screen) xs.
Orthography is exact for telecentric lenses and an approximation for telephoto lenses. After projection, the distance of the 3D point from the image can’t be recovered.
Here, the unit for s is px/m or px/mm to convert metric 3D points into pixels.
Perspective Projection
Perspective projection of a 3D point xc∈R3 to pixel coordinates xs∈R3
Light rays passes through the camera center, the pixel xs and the point xc
Convention: the principal axis (orthogonal to image plane) aligns with the z-axis
In perspective projection, 3D points in camera coordinates are mapped to the image plane by dividing their z component and multiplying with the focal length:
Note that this projection is linear when using homogeneous coordinates. After the projection, it is not possible to recover the distance of the 3D point from the image.
To ensure positive pixel coordinates, a principal point offset c is usually added
This moves the image coordinate system to the corner of the image plane
The complete perspective projection model is given by:
The left 3×3 submatrix of the projection matrix is called calibration matrix K
The parameters of K are called camera intrinsics (as opposed to extrinsic pose)
Here, fx and fy are independent, allowing for different pixel aspect ratios
The skew s arises due to the sensor not mounted perpendicular to the optical axis
In practice, we often set fx=fy and s=0, but model c=(cx,cy)⊤
Chaining Transformations
Let K be the calibration matrix (intrinsics) and [R∣t] the camera pose (extrinsics).
We chain both transformations to project a point in world coordinates to the image:
It is sometimes preferable to use a full rank 4×4 projection matrix:
x~s=[K0⊤01][R0⊤t1]xˉw=Pxˉw
Now, the homogeneous vector x~s is a 4D vector and must be normalized wrt. Its 3rd entry to obtain inhomogeneous image pixels:
xˉs=x~s/zs=(xs/zs,ys/zs,1,1/zs)⊤
Note that the 4th component of the inhomogeneous 4D vector is the inverse depth. If the inverse depth is known, a 3D point can be retrieved from its pixel coordinates via x~w=P~−1xˉw and subsequent normalization of x~w wrt. its 4th entry.
Lens Distortion
The assumption of linear projection is violated in practice due to the properties of the camera lens which introduces distortions. Both radial and tangential distortion effects can be modeled relatively easily: Let x=xc/zc,y=yc/zc and r2=x2+y2. The distorted point is obtained as:
Images can be undistorted such that the perspective projection model applies. More complex distortion models must be used for wide-angle lenses (e.g., fisheye).