Skip to main content

A command-line application and Python library for accessing DOI metadata from a CrossRef public data export via a Lightning key:value (DOI:metadata) database.

Project description

crossref-lmdb

A command-line application and Python library for accessing DOI metadata from a CrossRef public data export via a Lightning key:value (DOI:metadata) database.

The public data export from CrossRef is a very useful way to access large amounts of DOI metadata because it avoids the need to acquire data over the web API. However, the metadata is represented in the public data export as a large number of compressed JSON files - which makes it difficult and time-consuming to access the metadata for a given DOI. This project imports the metadata into a Lighting Memory-Mapped Database (LMDB), in which the DOIs are the database keys and the associated metadata are the database values.

[!WARNING] This database is mostly only useful for projects requiring a relatively small portion of the total metadata - creating and updating the database is likely to be prohibitively slow otherwise.

Features

  • Create a Lightning database from the CrossRef public data export, with optional filtering of DOI items based on custom Python code.
  • Update the database with items from the CrossRef web API that have been added or modified since a given date.
  • Read from the database in Python via a dict-like data structure.

Limitations

  • The Lightning database format is not very efficient with disk space for this data (see the LMDB documentation for more details).
  • The creation of the database is very slow, with database creation from the full 2024 public data export taking multiple days.
  • Updating the database is even slower.

[!NOTE] This project is not affiliated with, supported by, or endorsed by CrossRef.

Installation

The package can be installed using pip:

pip install crossref-lmdb 

Using the package requires the CrossRef public data export files (2024 release) to have been downloaded. See the instructions from CrossRef for obtaining these files.

Documentation

See https://unimelbmdap.github.io/crossref-lmdb/ for documentation.

Contact

Issues can be raised via the issue tracker.

Authors

Please feel free to email if you find this package to be useful or have any suggestions or feedback.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crossref_lmdb-0.1.2.tar.gz (117.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crossref_lmdb-0.1.2-py3-none-any.whl (17.1 kB view details)

Uploaded Python 3

File details

Details for the file crossref_lmdb-0.1.2.tar.gz.

File metadata

  • Download URL: crossref_lmdb-0.1.2.tar.gz
  • Upload date:
  • Size: 117.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.10

File hashes

Hashes for crossref_lmdb-0.1.2.tar.gz
Algorithm Hash digest
SHA256 cbced50497b1d63727ca048193796e171d65a49994e6b19c9e46826978377fd5
MD5 3a6a1aa2cd672dca4e038c6e55f72ab6
BLAKE2b-256 54f5409aba7f32e8d1ee2715aeec1f32831f342b633910d15d428afbf4c7e3a3

See more details on using hashes here.

File details

Details for the file crossref_lmdb-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for crossref_lmdb-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7902e96368b75762e0f638c7c23921cc8bf126318cb54c77c8ad53188db0476e
MD5 3b53ca9cba566e2f293af3bb235f2d44
BLAKE2b-256 f9a79d3633641d13ded9f6a36234afc0b1d61443c3212ef41ecf8b137197fad5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page