A problem frequently encountered in image processing is that of determining if an image is oriented properly. Sometimes this question is so difficult to answer that computer people, like their math people cousins, solve it by redefining the solution and solving for the redefinition.
In this case, instead of answering the question “Is this image properly oriented” we answer the question “Is this image aligned to some other image?” We’ll assume that “some other image” *is* oriented properly. So if we can line up our image with canonical image then we’re good to go.
There are a few techniques that are brought to bear. One is a frequency based technique that exploits the fact that the product of 2 functions is maximal when they’re perfectly aligned. This technique, convolution, is excellently visualized in this Wikipedia entry.
When it comes to images each image can be considered a 2 dimensional function (f(x, y) = z where x and y are the coordinates of any given pixel) defined over some finite interval. One can visualize sliding a 2D function over another 2D function by imagining a multicolored blanket sliding over another blanket. It can be moved in either or both of 2 directions: x and y.
The finite interval part is important. In the case of an image the finite intervals are the dimensions of the image. We assume it’s 0 everywhere else so that the product of the function outside of its dimensions is 0.
Since this is a 2D function, unlike the 1D function depicted in the Wikipedia article, their product produces a surface. The area under this surface will be maximal when they are perfectly aligned (assuming the images are identical).
A coworker has just explained a different, spatial technique for determining image alignment (registration in the jargon of the trade). In the spatial domain if 2 images are identical and perfectly lined up then if you were to subtract one from the other at each pixel location you would get 0s at every location. To account for variations in magnitude the square of the difference at each point can be taken. The sum of these squared differences will be zero when the images are perfectly aligned.
If the images are identical but not perfectly aligned you can figure out how to align them but sliding one around the other and examining the sum-of-squared differences (SSD). This too can be plotted as a surface the minimum of which represents the amount one image needs to be shifted in x and y to line up perfectly with the other.
There’s a lot assumption-wise that I’ve left out of the discussion. The technique assumes that the image has regular features. And that there are strong/sharp features (in frequency space these are represented by high frequencies) that will tend to dominate the SSD such that when the images are out of alignment the SSD will be large vs near zero when in alignment. If the image were uniform noise then the SSD is likely to bounce around with no clear minimum as there’s no sharp edge content to anchor the sum.