Video imprint (computer vision)

Proposed as an extension of image epitomes in the field of video content analysis, video imprint is obtained by recasting video contents into a fixed-sized tensor representation[1][2] regardless of video resolution or duration. Specifically, statistical characteristics are retained to some degrees so that common video recognition tasks can be carried out directly on such imprints, e.g., event retrieval, temporal action localization.[2] It is claimed that both spatio-temporal interdependences are accounted for and redundancies are mitigated during the computation of video imprints.

The option of computing video imprints exploiting the epitome model[3] has the advantage of more flexible input feature formats and more efficient training stage for video content analysis.

See also

edit

References

edit
  1. ^ Gao, Zhanning; Wang, Le; Jojic, Nebojsa; Niu, Zhenxing; Zheng, Nanning; Hua, Gang (2019-12-01). "Video Imprint". IEEE Transactions on Pattern Analysis and Machine Intelligence. 41 (12). Institute of Electrical and Electronics Engineers (IEEE): 3086–3099. arXiv:2106.03283. doi:10.1109/tpami.2018.2866114. ISSN 0162-8828. PMID 30130178. S2CID 52059105.
  2. ^ a b Gao, Zhanning; Wang, Le; Zhang, Qilin; Niu, Zhenxing; Zheng, Nanning; Hua, Gang (2019-07-17). "Video Imprint Segmentation for Temporal Action Detection in Untrimmed Videos" (PDF). Proceedings of the AAAI Conference on Artificial Intelligence. 33 (1): 8328–8335. doi:10.1609/aaai.v33i01.33018328. ISSN 2374-3468.
  3. ^ Jojic, N.; Frey, B.J.; Kannan, A. (2003). "Epitomic analysis of appearance and shape". Proceedings Ninth IEEE International Conference on Computer Vision. IEEE ICCV. pp. 34-41 vol.1. doi:10.1109/iccv.2003.1238311. ISBN 0-7695-1950-4.