What do I find in dblp.xml?

The dblp XML format is modeled after the BibTeX *.bib file format. The format is defined in the DTD file in the same directory. Please understand that (by design) our DTD is not very strict, as it makes no restriction to element order or multiplicity, and even allows nonsensical child elements (e.g., ‹school› tags in ‹article› elements, ‹editor› and ‹author› elements at the same time) that you will never find in the actual dblp data set. Our priority was to keep the definition clean and simple, and not to model every aspect of the publication landscape.

More information on the XML structure of the dblp records and several design decisions can be found in the following paper:

In general, our XML is a shallow but very long list of XML records. The root element has several million child elements, but usually no element is deeper than level three. An excerpt of the XML file looks like this:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE dblp SYSTEM "dblp.dtd">
<dblp>

[...]

<article key="journals/cacm/Gentry10" mdate="2010-04-26">
<author>Craig Gentry</author>
<title>Computing arbitrary functions of encrypted data.</title>
<pages>97-105</pages>
<year>2010</year>
<volume>53</volume>
<journal>Commun. ACM</journal>
<number>3</number>
<ee>http://doi.acm.org/10.1145/1666420.1666444</ee>
<url>db/journals/cacm/cacm53.html#Gentry10</url>
</article>

[...]

<inproceedings key="conf/focs/Yao82a" mdate="2011-10-19">
<title>Theory and Applications of Trapdoor Functions (Extended Abstract)</title>
<author>Andrew Chi-Chih Yao</author>
<pages>80-91</pages>
<crossref>conf/focs/FOCS23</crossref>
<year>1982</year>
<booktitle>FOCS</booktitle>
<url>db/conf/focs/focs82.html#Yao82a</url>
<ee>http://doi.ieeecomputersociety.org/10.1109/SFCS.1982.45</ee>
</inproceedings>

[...]

<www mdate="2004-03-23" key="homepages/g/OdedGoldreich">
<author>Oded Goldreich</author>
<title>Home Page</title>
<url>http://www.wisdom.weizmann.ac.il/~oded/</url>
</www>

[...]
</dblp>

Level 1: data records

The children of the root element represent the individual data records that are stored in dblp. In general, there are two types of records: publication records and person records.

Publication records are inspired by the BibTeX syntax and are given by one of the following elements:

Please note that while the bibtex type of the records does define certain categories on the dblp data records, these record categories are actually slightly different from the publication types that are used throughout the dblp website.
Please note that while there is a record type for proceedings volumes, there is no record type for journal volumes. Consequently, the dblp XML file contains no data entities for whole journal volumes or series. This is a (sometimes unfortunate) heritage of the BibTeX data model.

Person records are described separately here.

All records share a number of common attributes:

The values of the publtype attribute are from a controlled vocabulary. Multiple publtypes can be provided as a space-separated list. In the near future, we will replace some of the current publtype values to simplify parsing. The following table lists the publtypes in use for records. scope denotes if the publtype is used for publication records or person records. Note that annotation of record is partial. E.g., only a small amount of edited publications are annotated as edited.

scopecurrent valuefuture valuedescription
publicationencyclopedia entryencyclopediaPublication is reference work, e.g., an encyclopedia article.
publicationinformal publicationinformalPublication is gray literature, e.g., a preprint publications.
publicationedited publicationeditedEdited publication, e.g., an editorial or a news anouncement.
publicationsurveysurveyPublication is a survey article.
publicationwithdrawnwithdrawnPublication was officially withdrawn by the publisher.
persondisambiguation pagedisambiguationThe author profile associated with this person record does not represent a single author. See Why are some names followed by a four digit number for details.

Level 2: bibliographic metadata

Record elements do not contain any text, but they contain a number of child elements to specify the record's bibliographic metadata entries. See the Wikipedia page on BibTeX to learn which data entries are meaningful in which record type.

Note that in contrast to BibTeX, there are no key elements since the key is already an attribute of the record node. Also, there is a custom url element to specify a local hyperlink relative to the dblp websites homepage.

Most record elements can have one or more of the following optional attributes:

A detailed description of record elements can be found at How are data annotations used in dblp.xml.

Level 3: optional HTML markup

In the XML file, only title or booktitle elements contain optional HTML markups, and only a selected few markup elements are allowed:

In theory, the elements of this level may be nested arbitrarily deep to describe complex structures like formulas, e.g. ‹i›x‹sub›y‹sup›2‹/sup›‹/sub›‹/i› to describe x. However, such cases are very rare.

Entities

The dblp XML file is encoded in plain ASCII. Additional ISO/IEC 8859-1 (latin-1) characters are defined as named entities in the DTD and used whenever necessary.

At the moment, most parts of dblp are restricted to ISO-8859-1 (latin-1) characters, i.e. the first 255 Unicode characters. With exception to the the ‹author›- or ‹editor›-elements, where you will still find only latin-1 characters, you may find numerical entities outside of this range. For example, ‹title›-elements my contain Greek letters like an ε, or the ‹note›-elements of a person record may contain a Chinese name in the original Unicode spelling. All characters above the first 255 Unicode characters are given as numerical entities.

maintained by Schloss Dagstuhl LZI at University of Trier