Huawei AI-Solver Group
Huawei AI-Solver Group
新闻
研究论文
成员
联系
KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
Xing Li
,
Zeyu Xing
,
Yiming Li
,
Linping Qu
,
Hui-Ling Zhen
,
Wulong Liu
,
Yiwu Yao
,
Sinno Jialin Pan
,
Mingxuan Yuan
January 2025
Cite
Type
Journal article
Publication
ICML2025
Add the
full text
or
supplementary notes
for the publication here using Markdown formatting.
Cite
×