Correspondence problem

The correspondence problem refers to the problem of ascertaining which parts of one image correspond to which parts of another image,^[1] where differences are due to movement of the camera, the elapse of time, and/or movement of objects in the photos.

Correspondence is a fundamental problem in computer vision — influential computer vision researcher Takeo Kanade famously once said that the three fundamental problems of computer vision are: “Correspondence, correspondence, and correspondence!” ^[2] Indeed, correspondence is arguably the key building block in many related applications: optical flow (in which the two images are subsequent in time), dense stereo vision (in which two images are from a stereo camera pair), structure from motion (SfM) and visual SLAM (in which images are from different but partially overlapping views of a scene), and cross-scene correspondence (in which images are from different scenes entirely).

Overview

Given two or more images of the same 3D scene, taken from different points of view, the correspondence problem refers to the task of finding a set of points in one image which can be identified as the same points in another image. To do this, points or features in one image are matched with the points or features in another image, thus establishing corresponding points or corresponding features, also known as homologous points or homologous features. The images can be taken from a different point of view, at different times, or with objects in the scene in general motion relative to the camera(s).

The correspondence problem can occur in a stereo situation when two images of the same scene are used, or can be generalised to the N-view correspondence problem. In the latter case, the images may come from either N different cameras photographing at the same time or from one camera which is moving relative to the scene. The problem is made more difficult when the objects in the scene are in motion relative to the camera(s).

A typical application of the correspondence problem occurs in panorama creation or image stitching — when two or more images which only have a small overlap are to be stitched into a larger composite image. In this case it is necessary to be able to identify a set of corresponding points in a pair of images in order to calculate the transformation of one image to stitch it onto the other image.

Basic methods

Motion estimation showing correspondence between video frames^[3]

There are two basic ways to find the correspondences between two images.

Correlation-based – checking if one location in one image looks/seems like another in another image.

Feature-based – finding features in the image and seeing if the layout of a subset of features is similar in the two images. To avoid the aperture problem a good feature should have local variation in two directions.

Use

In computer vision the correspondence problem is studied for the case when a computer should solve it automatically with only images as input. Once the correspondence problem has been solved, resulting in a set of image points which are in correspondence, other methods can be applied to this set to reconstruct the position, motion and/or rotation of the corresponding 3D points in the scene.

The correspondence problem is also the basis of the particle image velocimetry measurement technique, which is nowadays widely used in the fluid mechanics field to quantitatively measure fluid motion.

Simple example

To find the correspondence between set A [1,2,3,4,5] and set B [3,4,5,6,7] find where they overlap and how far off one set is from the other. Here we see that the last three numbers in set A correspond with the first three numbers in set B. This shows that B is offset 2 to the left of A.

Simple correlation-based example

A simple method is to compare small patches between rectified images. This works best with images taken with roughly the same point of view and either at the same time or with little to no movement of the scene between image captures, such as stereo images.

A small window is passed over a number of positions in one image. Each position is checked to see how well it compares with the same location in the other image. Several nearby locations are compared for objects in one image which may not be at exactly the same image-location in the other image. It is possible that there is no fit that is good enough. This may mean that the feature is not present in both images, it has moved farther than your search accounted for, it has changed too much, or is being hidden by other parts of the image.

References

D. Scharstein and R. Szeliski. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms. (PDF)

^ W. Bach; J.K. Aggarwal (29 February 1988). Motion Understanding: Robot and Human Vision. Springer Science & Business Media. ISBN 978-0-89838-258-7.
^ X. Wang (September 2019). Learning and Reasoning with Visual Correspondence in Time.
^ John X. Liu (2006). Computer Vision and Robotics. Nova Publishers. ISBN 978-1-59454-357-9.

External links

Middlebury Stereo Vision page

[BachAggarwal1988-1] W. Bach; J.K. Aggarwal (29 February 1988). Motion Understanding: Robot and Human Vision. Springer Science & Business Media. ISBN 978-0-89838-258-7.

[Wang2019-2] X. Wang (September 2019). Learning and Reasoning with Visual Correspondence in Time.

[Liu2006-3] John X. Liu (2006). Computer Vision and Robotics. Nova Publishers. ISBN 978-1-59454-357-9.

[1]

[2]

[3]