CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference

Publication
arXiv preprint arXiv:2502.04416

Add the full text or supplementary notes for the publication here using Markdown formatting.