Efficient relational database queries over the entire Crossref abnd ORCID data sets
Project description
Alexandria3k
The alexandria3k package supplies a library and a command-line tool providing efficient relational query access to diverse publication open data sets. The largest one is the entire Crossref data set (157 GB compressed, 1 TB uncompressed). This contains publication metadata from about 134 million publications from all major international publishers with full citation data for 60 million of them. Alternatively, scientific publications can be selected from the PubMed data set (43 GB compressed, 327 GB uncompressed), which comprises more than 36 million citations for biomedical literature from MEDLINE, life science journals, and online books, with rich domain-specific metadata, such as MeSH indexing, funding, genetic, and chemical details. Other data sets that can be used or linked together are the ORCID summary data set (25 GB compressed, 435 GB uncompressed), containing about 78 million author records, the DataCite set of research outputs and resources, such as data, pre-prints, images, and samples, (22 GB compressed, 197 GB uncompressed), containing about 50 million work entries, the United States Patent Office issued patents (11 GB compressed, 115 GB uncompressed), containing about 5.4 million records, as well as data sets of funder bodies, journal names, open access journals, and research organizations.
The alexandria3k package installation contains all elements required to run it. It does not require the installation, configuration, and maintenance of a third party relational or graph database. It can therefore be used out-of-the-box for performing reproducible publication research on the desktop.
Documentation
The complete reference and use documentation for alexandria3k can be found here.
Major contributors
- Aggelos Margkas: US patents
- Bas Verlooy: PubMed
- Evgenia Pampidi: DataCite
Publication
Details about the rationale, design, implementation, and use of this software can be found in the following paper.
Diomidis Spinellis. Open reproducible scientometric research with Alexandria3k. PLoS ONE 18(11): e0294946. November 2023. doi: 10.1371/journal.pone.0294946
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file alexandria3k-3.3.0.tar.gz
.
File metadata
- Download URL: alexandria3k-3.3.0.tar.gz
- Upload date:
- Size: 655.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6314d720bc0bc3b206e90d4c6f112c743cc721d17d229efb963fdbdd9d911a87 |
|
MD5 | 99d3d7ca8ed7787890aee44165fbeaad |
|
BLAKE2b-256 | c352a0d8c5a44303cbbac00021065feaee17552bf32200f8bde9a5832d12cf9e |
File details
Details for the file alexandria3k-3.3.0-py3-none-any.whl
.
File metadata
- Download URL: alexandria3k-3.3.0-py3-none-any.whl
- Upload date:
- Size: 123.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6af938bb8aac902c71b3349ba3af8b2ea17170f80aa37a549dd12696f81e4548 |
|
MD5 | 6f50a79183a0b835551059adde3a50ea |
|
BLAKE2b-256 | fb437228c744e0658251efc70181b70716555b4457f311954c85b8869c86ceba |