Frontiers in Perceptual AI: First-person Video and Multimodal Perception
First-person or “egocentric” perception requires understanding the video and multimodal data that streams from wearable cameras and other sensors. The egocentric view offers a special window into the camera wearer’s attention, goals, and interactions with people and objects in the environment, making it an exciting avenue for both augmented reality and robot learning. The multimodal nature is particularly compelling, with opportunities to bring together audio, language, and vision.
Kristen Grauman, Professor at the University of Texas at Austin and Research Director at Facebook AI Research, begins her 2023 Embedded Vision Summit keynote presentation by introducing Ego4D, a massive new open-sourced multimodal egocentric dataset that captures the daily-life activity of people around the world. The result of a multi-year, multi-institution effort, Ego4D pushes the frontiers of first-person multimodal perception with a suite of research challenges ranging from activity anticipation to audio-visual conversation.
Building on this resource, Grauman presents her group’s ideas for searching egocentric videos with natural language queries (“Where did I last see X? Did I leave the garage door open?”), injecting semantics from text and speech into powerful video representations, and learning audio-visual models to understand a camera wearer’s physical environment or augment their hearing in busy places. She also touches on interesting performance-oriented challenges raised by having very long video sequences (hours!) and ideas for learning to scale retrieval and encoders.
Making Sense of Sensors: Combining Visual, Laser and Wireless Sensors to Power Occupancy Insights for Smart Workplaces
Just as humans rely on multiple senses to understand our environment, electronic systems are increasingly equipped with multiple types of perceptual sensors. Combining and integrating data from heterogeneous sensors (e.g., image, thermal, motion, LiDAR, RF) along with other types of data (e.g., card key swipes) can provide valuable insights. But combining data from heterogeneous sensors is challenging. In this presentation, Rakshit Agrawal, Vice President of Research and Development at Camio, shares his company’s experiences developing and applying practical techniques that combine heterogeneous data to enable real-world solutions within buildings. For example, occupancy insights in work spaces help to optimize use of space, improve staff engagement and productivity, enhance energy efficiency and inform maintenance scheduling decisions.
Accelerating the Era of AI Everywhere
Join the panel on a journey towards the era of AI everywhere—where perceptual AI at the edge is as commonplace as LCD displays and wireless connectivity. These distinguished industry experts—Jeff Bier, panel moderator and Founder of the Edge AI and Vision Alliance, Dean Kamen, Founder of DEKA Research and Development, Lokwon Kim, CEO of DEEPX, Jason Lavene, Director of Advanced Development Engineering at Keurig Dr Pepper, and Pete Warden, Chief Executive Officer of Useful Sensors—share their insights on what it will take to unlock the full potential of this groundbreaking technology, empowering it to enhance ease of use, safety, autonomy and numerous other capabilities across a wide range of applications. The panelists delve into the challenges that early adopters of perceptual AI have faced and why some product developers may still perceive it as too complicated, expensive or unreliable—and what can be done to address these issues. Above all, they chart a path forward for the industry, aiming to “cross the chasm” and make perceptual AI an accessible and indispensable feature of everyday products.
Generative AI: How Will It Impact Edge Applications and Machine Perception?
Seemingly overnight, ChatGPT has spurred massive interest in—and excitement around—generative AI, and has become the fastest growing application in history. How will generative AI transform how we think about AI, and how we use it? What types of commercial applications are best suited for solutions powered by today’s generative AI technology? Will recent advances in generative AI change how we create and use discriminative AI models, like those used for machine perception? Will generative AI obviate the need for massive reservoirs of hand-labeled training data? Will it accelerate our ability to create systems that effortlessly meld multiple types of data, such as text, images and sound? With state-of-the-art generative models exceeding 100 billion parameters, will generative models ever be suitable for deployment at the edge? If so, for what use cases? This lively and insightful panel discussion explores these and many other questions around the rapidly evolving role of generative AI in edge and machine-perception applications. Panelists include Sally Ward-Foxton, panel moderator and Senior Reporter at EE Times, Greg Kostello, CTO and Co-Founder of Huma.AI, Vivek Pradeep, Partner Research Manager at Microsoft, Steve Teig, CEO of Perceive, and Roland Memisevic, Senior Director at Qualcomm AI Research.
Qualcomm Cognitive ISP (Best Camera or Sensor)
Qualcomm’s Cognitive ISP is the 2023 Edge AI and Vision Product of the Year Award winner in the Cameras and Sensors category. The Cognitive ISP (within the Snapdragon 8 Gen 2 Mobile Platform) is the only ISP for smartphones that can apply the AI photo-editing technique called “Semantic Segmentation” in real-time. Semantic Segmentation is like “Photoshop layers,” but handled completely within the ISP. It will turn great photos into spectacular photos. Since it’s real-time, it’s running while you’re capturing photos and videos – or even before. You can see objects in the viewfinder being enhanced as you’re getting ready to shoot. A real-time Segmentation Filter is groundbreaking. This means the camera is truly contextually aware of what it’s seeing. Qualcomm achieved this by building a physical bridge between the ISP and the DSP – it’s called “Hexagon Direct Link”. The DSP runs Semantic Segmentation neural networks in real-time. Thanks to Hexagon Direct Link, the DSP and the ISP can operate simultaneously. The ISP captures images and the DSP assigns context to every image in real-time.
Please see here for more information on Qualcomm’s Cognitive ISP. The Edge AI and Vision Product of the Year Awards celebrate the innovation of the industry’s leading companies that are developing and enabling the next generation of edge AI and computer vision products. Winning a Product of the Year award recognizes a company’s leadership in edge AI and computer vision as evaluated by independent industry experts.