Chris Rowen, Vice President of AI Engineering for Webex Collaboration at Cisco Systems, presents the “Efficient Many-function Video ML at the Edge,” tutorial at the May 2023 Embedded Vision Summit.
Video streams are so rich, and video workloads are so sophisticated, that we may now expect video ML to supply many simultaneous insights and transformations. It will be increasingly common to need video segmentation, object and motion recognition, SLAM, 3D model extraction, relighting, avatarization and neural compression in parallel. Conventionally, this combination would overwhelm edge compute resources, but novel multi-headed ML models and unified video pipelines make this feasible on existing personal devices and embedded compute subsystems.
In this talk, Rowen discusses the goals for advanced video intelligence in secure, edge-powered video communications, and shows how new model structures can achieve very high accuracy, resolution and frame rate at low cost per function. He also discusses improved objective and subjective quality metrics, training set synthesis and his company’s optimized portable edge implementation methodology. Rowen wraps up with some observations on the challenges of even larger video workloads at the edge.
See here for a PDF of the slides.