Search

Huawei AI-Solver Group

Huawei AI-Solver Group

新闻
研究论文
成员
联系

Wulong Liu

Latest

CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference
KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval
TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling

© 2025 Me. This work is licensed under CC BY NC ND 4.0

Published with Hugo Blox Builder — the free, open source website builder that empowers creators.

Cite