In computer vision, a camera model is a mathematical description of how the light reflected or emitted at points in a 3D scene is projected to the image plane of the camera. The term camera model is normally used only to describe the projection of light through a space without a medium which interacts with the light, e.g., causing dispersion or in other ways affecting the path of the light.
The pinhole camera model
editAll camera models are based on the pinhole camera, which can be described as a closed box which on one side has a hole, or aperture, which allows light to enter the camera box and on the inside of the opposite side has the image plane where light intensity can be measured by means of some physical process. Exactly what type of process which is involved in the measurement of light intensity in the image plane, be it photo-chemical or photo-electrical, is not relevant for the pinhole camera model, even though it may have some consequences for the digital camera model, described below. A pinhole camera can be built and used in practice, see the pinhole camera article.
One of the main issues in designing and using a camera is how to obtain sharp images. For a real pinhole camera this can be done in two ways. Either the aperture is made sufficiently small to assure that the light rays which emanate from any particular point in the scene travel along lines which fall on the image plane approximately at a single point. However, the smaller the aperture is, the less light enters the camera and falls onto the image plane per time unit. More precisely, the product of the aperture area and the exposure time must be constant to produce the same light intensity measurement in the image plane. Consequently, if the aperture is made relatively small in order to assure sharp images, the exposure time has to be increased to obtain the same light intensity in the image plane. Alternatively, we can use a relatively large aperture, and correspondingly smaller exposure time, if the scene depicted in the camera image is at a sufficiently large distance from the camera to assure the same condition on the light rays and thereby obtain a sharp image.
Both the above strategies make the pinhole camera impractical to use, and instead we can use a large or even variable size aperture in combination with lenses. By choosing a suitable lens or system of lenses, it is possible to make light rays emanating from any point in the focal plane to converge at a single point in the image, even if they were traveling along slightly different and diverging rays. This type of "trick" is precisely what a lens camera does, it allows you to adjust the aperture size as well as the distance to the focal plane, and the general observation is that the larger the aperture is, the shorter the exposure time can be but at the same time the focal plane becomes "thinner", i.e., points now have to be really close the focal plane to appear sharp in the image. In practice, all cameras which are used in computer vision are lens cameras.
The pinhole camera model is mathematical model of the pinhole camera which assumes that the aperture is infinitely small, i.e., a single point. Consequently, for the pinhole camera model all light rays which emanate from a particular point the scene must fall onto the image at a single point. This implies that all points in the entire scene are depicted in terms of a sharp image in the camera but also that virtually no light enters the camera. In view of the above discussion, however, the pinhole camera model applies also to a lens camera if we can assume that the scene depicted in the camera image is sufficiently close to the focal plane of the camera. Exactly what "sufficiently close" means depends on the applications and the type of camera and lenses which are used. In general, however, we assume that these have been chosen with some care to make the pinhole camera model applicable to all or most of the points depicted in the camera image. This means that the pinhole camera model is only an approximation of how light which emanates from a point in the scene is projected into the image plane, and the validity of the approximation depends primarily on how well the lens system of the camera is adapted to the scene.