High-Level Documentation of CryptoDB
CryptoDB was started by Kevin McCurley, with an original goal to organize bibliographic information about IACR publications, including a record for each author, a record for each paper, and links to the original copies of the papers. It has some overlap with the IACR archive as well as ePrint and other bibliographic projects. Since then it has grown to collect additional information about videos, presentation materials, program committees, and awards.
A public JSON autosuggest API is available for any web forms that handle IACR data.
Most of the data in CryptoDB is stored in a relational database. There are five major tables:
- The pub table, which has one row per publication. It contains fields like the title, booktitle, year, pages, type of publication, etc. Each paper has a row, and each book has a row. If a paper has multiple versions, this is currently stored in the pub table.
- The author table, which has one row per person. People often use multiple names during their lifetime, so we sometimes see multiple records for the same person. These end up getting merged when we find them. Authors should strive to always use the same canonicalization of their name, and citations should always use the full name as it appears on the paper.
- The authorship table, which connects papers with human authors. Each record is a relation between a pub record and an author record, and contains a representation of the name that the person as it was used on that publication. Thus a person like Nigel P. Smart might have an authorship record with the name N. Smart to indicate that they used a shortened version of their name on that publication, but have the name Nigel P. Smart in the author table. This also handles the case of women who change their name after marriage. The authorship table also contains the order of authors on a paper, and has started collecting affiliations on papers as well.
- The conference table, which represents information about the venue other than the publication. Each row here should have a corresponding row in the pub table that represents the proceedings for the conference. Each proceedings in the pub table can have editors (represented as authors). This table has considerable overlap with the events database.
- The cryptostats table, which stores only committee membership, linking author records to program committees for conferences.
What's missing from CryptoDB
- Session information
- Preface and frontmatter to the books. This data resides only in the archive.
- Revision information about papers
- Discussion forum about papers
- Citation data
- Links to other versions (e.g., ePrint)
Data life cycle of a publication
The life cycle for a publication depends on the vehicle of publication. For the open source journals of Transactions on Symmetric Cryptology and Transactions on Cryptographic Hardware and Embedded Computing, the papers are submitted to a system that runs Open Journal Systems. This is where the papers end up being hosted under Gold Open Access, and the metadata about publications flows to CryptoDB within a day of publication through an automated process.
For papers that are published in the ePrint archive, they are submitted directly there and available under Gold Open Access. They are not refereed, and are not currently tracked by CryptoDB (but plans exist to include them).
Papers that are submitted to the Journal of Cryptology are handled by the Springer publishing system. We attempt to harvest data from the Springer metadata api, but this is a manual process that often fails.
Papers submitted to Asiacrypt, Crypto, Eurocrypt, PKC, or TCC are currently submitted to websubrev, where they are reviewed and then the accepted papers are passed to a proprietary system of Springer for publication. Once again, we attempt to harvest the metadata through the Springer API and the crossref API.
Data life cycle of a conferenceThe data for a conference typically flows through several systems:
- the events database
- The program committee tool, which enables importation of program committees into CryptoDB
- the conference website
- websubrev for reviewing (optional, and this data is private)
- Springer (proprietary)
- The conference program editor where sessions are formed and links are added for the papers presented at the conference.
- The IACR archive
- YouTube or videos on iacr.org (optional)
- CryptoDB ties them together.
There is only minimal cooperation between these, as they are maintained by different groups with limited goals.
If anyone would like a dump of the database, or has suggestions for other APIs, they should contact me (Kevin McCurley) at .