fbpx

Praveen Nayak, Tech Lead at PathPartner Technology, presents the "Using Deep Learning for Video Event Detection on a Compute Budget" tutorial at the May 2019 Embedded Vision Summit.

Convolutional neural networks (CNNs) have made tremendous strides in object detection and recognition in recent years. However, extending the CNN approach to understanding of video or volumetric data poses tough challenges, including trade-offs between representation quality and computational complexity, which is of particular concern on embedded platforms with tight computational budgets. This presentation explores the use of CNNs for video understanding.

Nayak reviews the evolution of deep representation learning methods involving spatio- temporal fusion from C3D to Conv-LSTMs for vision-based human activity detection. He proposes a decoupled alternative to this fusion, describing an approach that combines a low-complexity predictive temporal segment proposal model and a fine-grained (perhaps high- complexity) inference model. PathPartner Technology finds that this hybrid approach, in addition to reducing computational load with minimal loss of accuracy, enables effective solutions to these high complexity inference tasks.

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.

Contact

Address

1646 North California Blvd.,
Suite 360
Walnut Creek, CA 94596 USA

Phone
Phone: +1 (925) 954-1411
Scroll to Top