CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference

January 2025

Type

Publication

arXiv preprint arXiv:2502.04416

Add the full text or supplementary notes for the publication here using Markdown formatting.