Summary: | The behavior recognition method pays more attention to the action itself, but the short video contains less information. And it is necessary to utilize various feature information in the video as much as possible to improve the accuracy of behavioral recognition. Therefore, the short video behavior recognition method based on scene and behavior joint features is studied, and the scene information is used as context information to improve the effect of traditional single behavior recognition network. First, the scene features in the short video are extracted using a deep fusion network. Then, the behavioral features in the short video utilize the variable convolutional network for RGB features and flow features extraction. Finally, the dictionary learning method is used to sparsely represent the joint features, and more explanatory feature information is extracted for short video behavior recognition. The top-5 accuracy rate in the Charades test set is 33%. It is superior to the traditional single behavior recognition network, making the behavior recognition effect more accurate.
|