CryptoDB was started
by Kevin McCurley, with an original goal to organize bibliographic
information about IACR publications,
including a record for each author, a record for each paper, and links
to the original copies of the papers. It has some overlap with the
IACR archive as well as ePrint and other bibliographic projects. It
has grown to collect additional information about videos, presentation
materials, and program committees.
I had hoped that it would
eventually evolve into part of an
integrated IACR digital library that combines
the ePrint archive, the IACR archive, and CryptoDB. At this point CryptoDB
is supported by only one person.
A public JSON autosuggest API is
available for any web forms that handle IACR data.
Most of the data in CryptoDB is stored in a relational database. This makes it easy
to provide multiple views on the data, and keep the data separate from the
HTML views. There are five major tables:
- The pub table, which has one row per publication. It contains fields
like the title, booktitle, year, pages, type of publication, etc. Each paper has a row,
and each book has a row. If a paper has multiple versions, this is currently
stored in the pub table.
- The author table, which has one row per person. When people use more than
representation for their name, we sometimes get multiple records for the same person.
These end up getting merged when we find them. Authors should strive to always use
the same canonicalization of their name, and citations should always use the full
name as it appears on the paper.
- The authorship table, which connects papers with humans who are authors.
Each record is a relation between a pub record and an
author record, and contains a representation of the name that
the person as it was used on that publication. Thus a person like
Nigel P. Smart might have an authorship record with the name
N. Smart to indicate that they used a shortened version of their name
on that publication, but have the name Nigel P. Smart in the
author table. This also handles the case of women who change their name after
marriage. The authorship table also contains the order of
authors on a paper.
- The conference table, which represents information about the venue
other than the publication. Each row here should have a corresponding row
in the pub table that represents the proceedings for the conference.
Each proceedings in the pub table can have editors (represented as
authors). This table has considerable overlap with the events database.
- The cryptostats table, which stores only committee membership, linking
author records to program committees for conferences.
What's missing from CryptoDB
- Session information
- Preface and frontmatter to the books. This data resides only in
- Revision information
- Discussion forum about papers
- Citation data
Data life cycle for a conference.
The life cycle for a conference is long, and data is continually added to the databases.
The data is collected in several sources, and CryptoDB tries to eventually
incorporate all of it.
- the events database
- websubrev for reviewing (optional, and this data is private)
- Springer (proprietary)
- The IACR archive
- youtube or videos on iacr.org (optional)
- CryptoDB ties them together.
There is only minimal cooperation between these, as they are maintained
by different groups with limited goals.
Sequence of events for a conference
- IACR board or president approves location or "in cooperation with". The general chair and program
chair are generally known at this time.
- General chair enters data for a conference or workshop into the events database.
The data is marked pending/hidden until the events administrator approves it, and an
email is generated to initiate the review in the next step.
- Events administrator approves or disapproves, and may edit data from the
general chair (e.g., the geocoding or ICW status). At this point the data is only in
the events database, and the location may only be known approximately
(city and country).
- At some point in the future, the general chair may specify the hotel or venue
and the geocoding would get updated. This is handled by the events administrator.
- The call for papers comes out, and the program chair and program committee is
selected. At this
time we can create a record for the conference in the conference table
of cryptodb, and add the program committee memberships.
- Authors submit their papers to whatever the program chair has decided to use for reviewing. If they use websubrev, there is no matching of authors to existing records in
cryptodb at the time of
submission, but this would be easy using a public json interface from cryptodb (e.g., start typing
your name in the search bar above - first or last name). The author has just created the first
metadata record for the paper, though title, PDF, and authors may change during the review process.
- The program of accepted papers is finalized, so that entries can appear in cryptodb for the papers.
This would typically happen either by the program chair using a form from websubrev, or manual entry.
- Authors submit final versions of their papers to the program chair using whatever ad hoc method
the program chair chooses. The program chair prepares a tar ball of files according to Springer's
process. At this point the PDFs diverge between the two versions.
Springer may add DOIs and
work on improving the citations. Authors could have done this if they had incentive
they generally do a poor job of this due to ignorance of their responsibilities.
- The archive receives a copy of tarball of files for Springer. This data is
uploaded to iacr.org in order to be processed for the IACR archive.
It includes PDFs of the
papers, but also the preface, table of contents, and a keyword index of some sort. The data is not
in any format that is easy to incorporate into cryptodb, but is similar to what the printed
proceedings looks like (Springer may perform further editing). At this time the
integration with cryptodb is undocumented.
- The invited talks are finalized, including titles. This often happens after the program of
submitted talks is announced, but becomes part of the program. Such talks may or may not have
a paper in the proceedings. Invited talks may include the IACR distinguished lecture (which is
known well in advance, but tracked elsewhere).
- The proceedings is published, including the assignment of DOIs, page numbers, ISBNs, volume
number, and final PDFS are produced by the publisher. There is currently no process to
receive this data from the publisher.
- Best paper awards
are announced, and may be added to cryptodb.
- Papers are distributed to attendees by whatever ad hoc mechanism is negotiated between
the general chair and the publisher. This could be a USB key, a paper proceedings, or
access to the online password-protected version on the Springer web site.
- The event stops showing up in the calendar as a "future event".
- Authors optionally supply presentation materials to the general chair, and the general
chair optionally arranges for the talks to be videotaped. These are later processed for incorporation
into cryptodb and the youtube channel.
If anyone would like a dump of the database,
or has suggestions for other APIs, they should contact me (Kevin McCurley) at