Prediction and Description of Near-Future Activities in Video

作者:

Highlights:

• Our work provides a description of near-future activities from current observations.

• Ours is one of the earliest works for captioning near-future events in videos.

• We use spatio-temporal relationship of activities and objects for label prediction.

• We use a sequence-to-sequence learning-based approach for mapping labels to captions.

• We perform extensive experiments to show the effectiveness of the proposed framework.

摘要

•Our work provides a description of near-future activities from current observations.•Ours is one of the earliest works for captioning near-future events in videos.•We use spatio-temporal relationship of activities and objects for label prediction.•We use a sequence-to-sequence learning-based approach for mapping labels to captions.•We perform extensive experiments to show the effectiveness of the proposed framework.

论文关键词:Label,Caption,LSTM,Fully connected network,Sequence-to-sequence

论文评审过程:Received 29 May 2020, Revised 21 May 2021, Accepted 24 May 2021, Available online 29 May 2021, Version of Record 12 June 2021.

论文官网地址:https://doi.org/10.1016/j.cviu.2021.103230