How accurate is the data in dblp?

Unfortunately, we ourselves do not have any metric or study to answer this question with scientific rigor. However, there is a recent, independent study on the author name disambiguation task of dblp which states:

In conclusion, the evaluation results reported in this paper suggest that scholars can regard DBLP data as highly accurate in disambiguating author names. But a caveat to keep in mind is that some homonym cases (distinct authors with the same names) may not be properly distinguished.

So, if you are looking for a rigorous analysis, you might want to have a look there:

The dblp data curation workflow

Having said that, be assured that we take our data quality very serious and we put a lot of effort into executing a process that helps to make data in dblp as reliable as humanly possible. To see this, please have a look at our data curation work flow:

dblp always indexes the tables of contents of complete proceedings or journal volumes in bulk. Usually, the necessary meta data for each volume is obtained by us directly from the publisher of a volume or the organizer of an event. In a smaller number of cases, meta data is submitted to dblp by voluntary helpers from the community. Once we have obtained the data, a rigorous data cleaning process is applied by an editor from the dblp team. This process is supported by some simple algorithms checking the consistency of the data, but is mainly executed by hand. This manual curation process has four major goals:

Only after a full data cleaning pass has been applied to the data, the new records are added to the dblp data set. However, the data cleaning process does not end here. In an iterative process, for the next few days, newly added data is monitored by special helper scripts for any suspicious signs of data inconsistency. For example, we often observe a certain ripple effect: On the first day, a newly added publication helps to uncover formerly unrecognized homonymous of synonymous data record. Thanks to fixing this information, on the next day, further inconsistencies in further records become evident, and so on. Such an iterative effect may last for several days.

In the end – as a lower bound, if you will – the data listed in dblp should be at least as accurate as the data provided by the publishers. However, we spent a lot of our limited resources to make sure to remove as many mistakes, mis-assignments, and inconsistencies as humanly possible.

a service of  Schloss Dagstuhl - Leibniz Center for Informatics