Suche
Anzeige der Dokumente 1-1 von 1
Towards Uncertainty-Aware Low-Bit Quantized LLMs for On-Device Inference
(2026-03-06)
Quantizing large language models (LLMs) significantly reduces memory usage and computational requirements, enabling efficient on-device inference. However, aggressive quantization can degrade model performance and exacerbate ...



