Read full OpenAlex snapshot and convert to reduced dataset.
Project description
oasnap-reader
Read a full OpenAlex snapshot and convert it to a reduced dataset for analysis.
The reduced dataset selects works that are not paratext and have information about subfields.
Full conversion time depends on available hardware. The full conversion with >2000 files took ~2 hours with 24 workers and sufficient RAM.
Install
pip install oasnap-reader
Usage
from pathlib import Path
from oasnap_reader.reader import ReadGZ
reader = ReadGZ(
in_path=Path("/data/openalex/works"),
out_path=Path("/data/reduced"),
)
reader.read_all()
in_path should point to the root of the OpenAlex works snapshot directory.
Output is one gzip-compressed JSONL file per input file, written to out_path.
Documentation
See the full documentation including usage options and API reference at the project docs site.
Development
git clone https://gitlab.gwdg.de/mpigea/dt/oasnap-reader
cd oasnap-reader
uv sync --dev
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file oasnap_reader-0.1.1.tar.gz.
File metadata
- Download URL: oasnap_reader-0.1.1.tar.gz
- Upload date:
- Size: 8.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28912995fe9380494f5557aaaddd9427a7febb61c04a8623796357d1f0140c38
|
|
| MD5 |
1cbe7b1540109305c5590ba65ef6cfdf
|
|
| BLAKE2b-256 |
aa61371f2a1599631ea67f26d87cfc3e51bb6670b40c159c2978ff96a282fd34
|
File details
Details for the file oasnap_reader-0.1.1-py3-none-any.whl.
File metadata
- Download URL: oasnap_reader-0.1.1-py3-none-any.whl
- Upload date:
- Size: 10.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f003615aa59de13ae2ecd8383925048f74424aa3bd958a0f899154b728dc083
|
|
| MD5 |
1479b85b8e39ce934eb13993a8f03efc
|
|
| BLAKE2b-256 |
9fa7831267a1d9889f56ca6d3f0e6261c5a43dae1737bbd28f54da5682272a74
|