Re-LAMP 2025 Workshop

Efficient Microarchitectures for Resilient Large Model Processing Workshop

Held in conjunction with MICRO 2025

October 18th, 1pm-5pm @ Seoul, South Korea

Room: Bell-vue

Workshop Program

Time	Topic	Speakers	Institution
13:00–13:05	Opening and Welcome	Freddy Gabbay	Hebrew University
13:05–13:50	Keynote talk: Data-Centric Multi-Acceleration for Gen-AI Inference	Mohammad Alian	Cornell
13:50–14:20	Dynamic Multi-Precision Representation for Bandwidth-Efficient KV-Cache	Maayan Ella	Technion
14:20–15:00	Blazingly Fast LLM Serving	Baris Kasikci	University of Washington
15:00–15:10	Break
15:10–15:40	A Generic Framework for High Robustness and High Accuracy Mixed Precision AI Training	Yann Delorme	Huawei
15:40–16:20	From KV Caches to Tensor Cores: Practical Unstructured Sparsity for LLM Inference	Ramyad Hadidi	d-Matrix
16:20–17:00	Enabling HW/SW Co-design for Practical and Scalable LLM Serving with General-purpose Near-Data Processing	Gwangsun Kim	POSTECH

About the Workshop

The rapid evolution of Large Language Models (LLMs) and the emergence of Large Multimodal Models (LMMs) are revolutionizing various domains. Simultaneously, the pursuit of very long-context LLMs (e.g., 1M context length) is pushing the boundaries of what these models can achieve. However, the immense computational, memory, and power requirements of these advanced models present formidable challenges to current hardware and system designs.

Fortunately, large models, including LLMs, LMMs, and those handling extended contexts, often exhibit inherent resiliency to noise and approximation. This workshop aims to harness this property by exploring microarchitectural innovations and system-level techniques that exploit such resiliency to significantly improve performance, power efficiency, and memory utilization. Our focus will extend beyond traditional LLMs to encompass the unique challenges and opportunities presented by multimodal data and extremely long contexts.

Topics will include, but are not limited to, approximate computing, dynamic quantization, and adaptive methods that apply different levels of approximation or quantization across layers within these complex models. Additionally, the workshop will address critical memory efficiency concerns through novel data compression techniques that leverage model resiliency to reduce memory footprint, especially crucial for LMMs and long-context LLMs, while maintaining or even improving model performance.

By bringing together researchers, practitioners, and industry experts, the ReLAMP workshop seeks to foster discussions and drive advancements in efficient microarchitectures and systems for the next generation of large model processing. This will pave the way for more sustainable, scalable, and capable AI solutions.

Program Committee

Prof. Freddy Gabbay, The Hebrew University
Prof. Kingsum Chow, Zhejiang University
Prof. Tony Wu, Zhejiang University
Prof. Leng Jinwei, Shanghai Jiao Tong University
Dr. Zhou Yigang, Huawei
Dr. Wang Junsong, Huawei
Dr. Emily Rozenshine, Huawei Europe

Contact Us

Any questions may be directed to: freddy.gabbay@mail.huji.ac.il

The Microsoft CMT service was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.