Home/CAPEC/Using UTF-8 Encoding to Bypass Validation Logic
Attack Pattern

Using UTF-8 Encoding to Bypass Validation Logic

CAPEC-80 · Detailed · Draft
This attack is a specific variation on leveraging alternate encodings to bypass validation logic. This attack leverages the possibility to encode potentially harmful input in UTF-8 and submit it to applications not expecting or effective at validating this encoding standard making input filtering difficult. UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. Legal UTF-8 characters are one to four bytes long. However, early version of the UTF-8 specification got some entries wrong (in some cases it permitted overlong characters). UTF-8 encoders are supposed to use the "shortest possible" encoding, but naive decoders may accept encodings that are longer than necessary. According to the RFC 3629, a particularly subtle form of this attack can be carried out against a parser which performs security-critical validity checks against the UTF-8 encoded form of its input, but interprets certain illegal octet sequences as characters.
likelihood: High severity: High
External lookups - second-class, for what we don’t hold ourselves
Vulnerabilities
CISA KEV catalog
CWE weaknesses
CAPEC attack patterns
Package vulnerabilities
Threat intelligence
Threat actors
Tools & malware
ATT&CK techniques
IOCs
Detection & defense
Sigma rules
YARA rules
Atomic Red Team tests
D3FEND countermeasures
Compliance
NIST 800-53
ISO 27001:2022
SOC 2 TSC
PCI-DSS v4.0
CIS Controls v8.1
About
All capabilities
Live statistics
Data sources
Privacy policy
Terms of service
threatengine.sh  ·  Open-source threat intelligence platform  ·  100+ authoritative sources  ·  Every fact traces to its origin