“Quantization Techniques for Efficient Deployment of Large Language Models: A Comprehensive Review,” a Presentation from AMD
Dwith Chenna, MTS Product Engineer for AI Inference at AMD, presents the “Quantization Techniques for Efficient Deployment of Large Language Models: A Comprehensive Review” tutorial at the May 2025 Embedded Vision Summit. The deployment of large language models (LLMs) in resource-constrained environments is challenging due to the significant computational and… “Quantization Techniques for Efficient Deployment […]