CVE-2026-53923
vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements.
The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.
- ⚠ NVD has not scored this CVE yet - manual triage required (common for recent CVEs)
ATT&CK techniques
20Techniques this CVE enables. Pills with a solid outline are high confidence - named directly in ATT&CK or Nuclei, or human-curated by CTID; the rest are inferred from the weakness type using MITRE's CVE Mapping Methodology and the CWE → CAPEC chain. Broad, generic-weakness guesses are filtered out. A small N× marks a technique that N independent sources agree on.
CAPEC attack patterns
12Attack patterns this CVE enables - the bridge from weakness to ATT&CK technique.