User:TacticBucket/sandbox

Original article: Saliency map.

Saliency map

edit
 
A view of the fort of Marburg (Germany) and the saliency Map of the image using color, intensity and orientation.

In computer vision, a saliency map is an image that highlights the region on which people's eyes focus first. The goal of a saliency map is to reflect the degree of importance of a pixel to the human visual system. For example, in this image, a person first looks at the fort and light clouds, so they should be highlighted on the saliency map.

Application

edit

Saliency maps have applications in a variety of different problems. Some general applications:

  • Image and video compression: The human eye focuses only on a small region of interest in the frame. Therefore, it is not necessary to compress the entire frame with uniform quality. According to the authors, using a salience map reduces the final size of the video with the same visual perception.[1]
  • Image and video quality assessment: The main task for an image or video quality metric is a high correlation with user opinions. Differences in salient regions are given more importance and thus contribute more to the quality score.[2]
  • Image retargeting: It aims at resizing an image by expanding or shrinking the noninformative regions. Therefore, retargeting algorithms rely on the availability of saliency maps that accurately estimate all the salient image details.[3]
  • Object detection and recognition: Instead of applying a computationally complex algorithm to the whole image, we can use it to the most salient regions of an image most likely to contain an object.[4]

Algorithms

edit

There are three forms of classic saliency estimation algorithms implemented in OpenCV:

  • Static saliency: Relies on image features and statistics to localize the regions of interest of an image.
  • Motion saliency: Relies on motion in a video, detected by optical flow. Objects that move are considered salient.
  • Objectness: Objectness reflects how likely an image window covers an object. These algorithms generate a set of bounding boxes of where an object may lie in an image.

In addition to classic approaches, neural-network-based are also popular. There are examples of neural networks for motion saliency estimation:

  • TASED-Net: It consists of two building blocks. First, the encoder network extracts low-resolution spatiotemporal features, and then the following prediction network decodes the spatially encoded features while aggregating all the temporal information.
  • STRA-Net: It emphasizes two essential issues. First, spatiotemporal features integrated via appearance and optical flow coupling, and then multi-scale saliency learned via attention mechanism.
  • STAViS: It combines spatiotemporal visual and auditory information. This approach employs a single network that learns to localize sound sources and to fuse the two saliencies to obtain a final saliency map.

Datasets

edit

The saliency dataset usually contains human eye movements on some image sequences. It is valuable for new saliency algorithm creation or benchmarking the existing one. The most valuable dataset parameters are spatial resolution, size, and eye-tracking equipment. Here is part of the large datasets table from MIT/Tübingen Saliency Benchmark datasets, for example.

Saliency datasets
Dataset Resolution Size Observers Durations Eyetracker
CAT2000 1920×1080px 4000 images 24 5 sec EyeLink 1000 (1000Hz)
EyeTrackUAV2 1280×720px 43 videos 30 33 sec EyeLink 1000 Plus (1000 Hz, binocular)
CrowdFix 1280×720px 434 videos 26 1-3 sec The Eyetribe Eyetracker (60 Hz)
SAVAM 1920×1080px 43 videos 50 20 sec SMI iViewXTM Hi-Speed 1250 (500Hz)

To collect a saliency dataset, image or video sequences and eye-tracking equipment must be prepared, and observers must be invited. Observers must have normal or corrected to normal vision and must be at the same distance from the screen. At the beginning of each recording session, the eye-tracker recalibrates. To do this, the observer fixates his gaze on the screen center. Then the session started, and saliency data are collected by showing sequences and recording eye gazes.

The eye-tracking device is a high-speed camera, capable of recording eye movements at least 250 frames per second. Images from the camera are processed by the software, running on a dedicated computer returning gaze data.

References

edit
  1. ^ Guo, Chenlei; Zhang, Liming (Jan 2010). "A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression". IEEE Transactions on Image Processing. 19 (1): 185–198. doi:10.1109/TIP.2009.2030969. ISSN 1057-7149.
  2. ^ Tong, Yubing; Konik, Hubert; Cheikh, Faouzi; Tremeau, Alain (2010-05-01). "Full Reference Image Quality Assessment Based on Saliency Map Analysis". Journal of Imaging Science and Technology. 54 (3): 30503–1–30503-14. doi:10.2352/J.ImagingSci.Technol.2010.54.3.030503.
  3. ^ Goferman, Stas; Zelnik-Manor, Lihi; Tal, Ayellet (Oct 2012). "Context-Aware Saliency Detection". IEEE Transactions on Pattern Analysis and Machine Intelligence. 34 (10): 1915–1926. doi:10.1109/TPAMI.2011.272. ISSN 1939-3539.
  4. ^ Jiang, Huaizu; Wang, Jingdong; Yuan, Zejian; Wu, Yang; Zheng, Nanning; Li, Shipeng (June 2013). "Salient Object Detection: A Discriminative Regional Feature Integration Approach". 2013 IEEE Conference on Computer Vision and Pattern Recognition. IEEE. doi:10.1109/cvpr.2013.271.