How to parse dblp.xml?

The dblp.xml is a simple, plain ASCII XML file, using the named entities as given in the accompanying dblp.dtd file. A daily updated (but unversioned) XML dump can be found on the dblp web server:

Furthermore, each month, a persistent snapshot release is archived:

We strongly encourage you to use these snapshot releases for your experiments and to cite them by their persistent URLs in published articles. This will allow your experiments to be reproducible in the future.

Detailed information on the XML structure of the dblp records and several design decisions can be found in the following paper:

The dblp.xml file can be parsed by essentially any out-of-the-box XML parser.

Example parser

As an example, we provide a simple main memory data structure to parse and query the whole dblp data, written in Java. The code in this section has been tested using the following environment:

Running the parser

Please load the files

from our web server into a local directory. E.g., you may run the following command:

wget https://dblp.org/src/DblpExampleParser.java \
	https://dblp.org/src/mmdb-2019-04-29.jar \
	https://dblp.org/xml/release/dblp-2019-04-01.xml.gz \
	https://dblp.org/xml/release/dblp-2017-08-29.dtd

Unzip the dblp.xml.gz file using:

gunzip dblp-2019-04-01.xml.gz

Compile the parser:

javac -cp mmdb-2019-04-29.jar DblpExampleParser.java

Run the example application:

java -Xmx8G -cp mmdb-2019-04-29.jar:. DblpExampleParser dblp-2019-04-01.xml dblp-2017-08-29.dtd

JavaDoc and sources

The JavaDoc pages and the sources for the org.dblp.mmdb package are also available for download:


a service of  Schloss Dagstuhl - Leibniz Center for Informatics