Skip to main content

Efficient relational database queries over the entire Crossref abnd ORCID data sets

Project description

Alexandria3k CI

Alexandria3k

The alexandria3k package supplies a library and a command-line tool providing efficient relational query access to the following large scientific publication open data sets. Data are decompressed on the fly, thus allowing the package's use even on storage-restricted laptops.

  • Crossref (157 GB compressed, 1 TB uncompressed). This contains publication metadata from about 134 million publications from all major international publishers with full citation data for 60 million of them.
  • PubMed (43 GB compressed, 327 GB uncompressed). This comprises more than 36 million citations for biomedical literature from MEDLINE, life science journals, and online books, with rich domain-specific metadata, such as MeSH indexing, funding, genetic, and chemical details.
  • ORCID summary data set (25 GB compressed, 435 GB uncompressed). This contains about 78 million author details records.
  • DataCite (22 GB compressed, 197 GB uncompressed). This comprises research outputs and resources, such as data, pre-prints, images, and samples, containing about 50 million work entries.
  • United States Patent Office issued patents (11 GB compressed, 115 GB uncompressed). This containins about 5.4 million records.

Further supported data sets include funder bodies, journal names, open access journals, and research organizations.

The alexandria3k package installation contains all elements required to run it. It does not require the installation, configuration, and maintenance of a third party relational or graph database. It can therefore be used out-of-the-box for performing reproducible publication research on the desktop.

Installation and documentation

  • 📦 The alexandria3k is available on PyPI.
  • 📄 Full reference and use documentation for alexandria3k is available here.

Major contributors

Publication

Details about the rationale, design, implementation, and use of this software can be found in the following paper.

Diomidis Spinellis. Open reproducible scientometric research with Alexandria3k. PLoS ONE 18(11): e0294946. November 2023. doi: 10.1371/journal.pone.0294946

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alexandria3k-3.6.0.tar.gz (660.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

alexandria3k-3.6.0-py3-none-any.whl (124.5 kB view details)

Uploaded Python 3

File details

Details for the file alexandria3k-3.6.0.tar.gz.

File metadata

  • Download URL: alexandria3k-3.6.0.tar.gz
  • Upload date:
  • Size: 660.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.2

File hashes

Hashes for alexandria3k-3.6.0.tar.gz
Algorithm Hash digest
SHA256 b23129ed228d5fffe9c117cc743bd7c1a4993a0923f709a50267f50af4ffc14b
MD5 b0dccce44ae9824e924ce6f3e7581915
BLAKE2b-256 a13583cf6026f4001b3a4d7d133f59b61cb994ae054b1cc52219cdf5cc92c49c

See more details on using hashes here.

File details

Details for the file alexandria3k-3.6.0-py3-none-any.whl.

File metadata

  • Download URL: alexandria3k-3.6.0-py3-none-any.whl
  • Upload date:
  • Size: 124.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.2

File hashes

Hashes for alexandria3k-3.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c73373dccb019a7431a164218c40367a43f266389f1e3a02852f735349f2f268
MD5 bec3f87fc4d043977ba585f82daac644
BLAKE2b-256 76464d3bfc700c0a282868ed99f72e4da917ac94e21cdc3d57d1040e8a614f45

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page