Huawei AI-Solver Group
Huawei AI-Solver Group
新闻
研究论文
成员
联系
Sinno Jialin Pan
Latest
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference
KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval
FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers
Cite
×