'High speed action recognition and localization in compressed domain videos'
(Chuohao Yeo, Parvez Ahammad, Kannan Ramchandran and Shankar Sastry)
to appear in IEEE Transactions on Circuits and Systems for Video Technology
[bibtex] [pdf]

Abstract
We present a compressed domain scheme that is able to recognize and localize actions at high speeds. The recognition problem is posed as performing an action video query on a test video sequence. Our method is based on computing motion similarity using compressed domain features which can be extracted with low complexity. We introduce a novel motion correlation measure that takes into account differences in motion directions and magnitudes. Our method is appearance invariant, requires no prior segmentation, alignment or stabilization, and is able to localize actions in both space and time. We evaluated our method on a benchmark action video database consisting of 6 actions performed by 25 people under 3 different scenarios. Our proposed method achieved a classification accuracy of 90%, comparing favorably with existing methods in action classification accuracy, and is able to localize a template video of 80 x 64 pixels with 23 frames in a test video of 368 x 184 pixels with 835 frames in just 11 seconds, easily outperforming other methods in localization speed. We also perform a systematic investigation of the effects of various encoding options on our proposed approach. In particular, we present results on the compression-classification trade-off, which would provide valuable insight into jointly designing a system that performs video encoding at the camera front-end and action classification at the processing back-end.

'Unsupervised discovery of action hierarchies in large collections of activity videos'
(Parvez Ahammad, Chuohao Yeo, Kannan Ramchandran and Shankar Sastry)
in IEEE International Workshop on Multimedia Signal Processing (MMSP) 2007
[bibtex] [pdf]

Abstract
Given a large collection of videos containing activities, we investigate the problem of organizing it in an unsupervised fashion into a hierarchy based on the similarity of actions embedded in the videos. We use spatio-temporal volumes of filtered motion vectors to compute appearance-invariant action
similarity measures efficiently - and use these similarity measures in hierarchical agglomerative clustering to organize videos into a hierarchy such that neighboring nodes contain similar actions. This naturally leads to a simple automatic scheme for selecting videos of representative actions (exemplars) from the database and for efficiently indexing the whole database. We compute a performance metric on the hierarchical structure to evaluate goodness of the estimated hierarchy, and show that this metric has potential for predicting the clustering performance of various joining criteria used in building hierarchies. Our results show that perceptually meaningful hierarchies can be constructed based on action similarities with minimal user supervision, while providing favorable clustering performance and retrieval performance.

Results (Videos obtained from Laptev's action dataset)
 

'Compressed domain real-time action recognition'
(Chuohao Yeo, Parvez Ahammad, Kannan Ramchandran and Shankar Sastry)
in IEEE International Workshop on Multimedia Signal Processing (MMSP) 2006
[bibtex] [pdf]

Abstract
We present a compressed domain scheme that is able to recognize and localize actions in real-time. The recognition problem is posed as performing a video query on a test video sequence. Our method is based on computing motion similarity using compressed domain features which can be extracted with low complexity. We introduce a novel motion correlation measure that takes into account differences in motion magnitudes. Our method is appearance invariant, requires no prior segmentation, alignment or stabilization, and is able to localize actions in both space and time. We evaluated our method on a large action video database consisting of 6 actions performed by 25 people under 3 different scenarios. Our classification results compare favorably with existing methods at only a fraction of their computational cost.

Results (Test videos obtained from Michal Irani's dataset)
Example query:

Test frame and output:
...
...