Advances and Challenges in Video Saliency Prediction: A Comprehensive Survey

Published in In Process of Writing Manuscript, 2024

Authors

Iman Kianian, Pooria Omrani, Zahra Ebrahimian, Alireza Hosseini, Ramin Toosi, Mohammad Ali Akhaee

Abstract

Video saliency prediction is a computer vision task that involves automatically identifying and localizing the most visually significant regions within video frames. The goal is to determine which parts of a video frame are most likely to capture human attention and interest. This task is essential because it plays a crucial role in various applications, such as video summarization, object tracking, content recommendation, and more. Video saliency prediction is challenging due to the dynamic and temporal nature of videos, as it requires modeling how human visual attention changes over time while watching a video. It involves complex machine learning and deep learning techniques to analyze and predict salient regions within video sequences, contributing to improved video understanding and user experiences. This survey examines diverse approaches and highlights various evaluation metrics, benchmark datasets, and applications where saliency prediction plays a crucial role. It also compares the state-of-the-art approaches with each other on benchmark datasets with two factors of accuracy and time complexity. By identifying the existing gaps and open research directions, this survey aims to inspire future advancements in the field of video saliency prediction, facilitating the development of more accurate and robust saliency models for various applications.