Deeply-Supervised CNN Model for Action Recognition with Trainable Feature Aggregation
A Published in IJCAI, 2018
Author: Yang Li and Kan Li and Xinxin Wang.
Published in: International Joint Conference on Artificial Intelligence
Abstract
In this paper, we propose a deeply-supervised CNN model for action recognition that fully exploits powerful hierarchical features of CNNs. In this model, we build multi-level video representations by applying our proposed aggregation module at different convolutional layers. Moreover, we train this model in a deep supervision manner, which brings improvement in both performance and efficiency. Meanwhile, in order to capture the temporal structure as well as preserve more details about actions, we propose a trainable aggregation module.It models the temporal evolution of each spatial location and projects them into a semantic space using the Vector of Locally Aggregated Descriptors (VLAD) technique. This deeply-supervised CNN model integrating the powerful aggregation module provides a promising solution to recognize actions in videos. We conduct experiments on two action recognition datasets: HMDB51 and UCF101. Results show that our model outperforms the state-ofthe-art methods.
Recommended citation:
@inproceedings{YangLi2018DeeplySupervisedCM,
title={Deeply-Supervised CNN Model for Action Recognition with Trainable Feature Aggregation},
author={Yang Li and Kan Li and Xinxin Wang},
booktitle={International Joint Conference on Artificial Intelligence},
year={2018}
}