KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference

January 2025

Type

Publication

ICML2025

Add the full text or supplementary notes for the publication here using Markdown formatting.