Skip to main content

Read full OpenAlex snapshot and convert to reduced dataset.

Project description

oasnap-reader

Read a full OpenAlex snapshot and convert it to a reduced dataset for analysis.

The reduced dataset selects works that are not paratext and have information about subfields.

Full conversion time depends on available hardware. The full conversion with >2000 files took ~2 hours with 24 workers and sufficient RAM.

Install

pip install oasnap-reader

Usage

from pathlib import Path
from oasnap_reader.reader import ReadGZ

reader = ReadGZ(
    in_path=Path("/data/openalex/works"),
    out_path=Path("/data/reduced"),
)
reader.read_all()

in_path should point to the root of the OpenAlex works snapshot directory. Output is one gzip-compressed JSONL file per input file, written to out_path.

Documentation

See the full documentation including usage options and API reference at the project docs site.

Development

git clone https://gitlab.gwdg.de/mpigea/dt/oasnap-reader
cd oasnap-reader
uv sync --dev

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oasnap_reader-0.1.0.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oasnap_reader-0.1.0-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file oasnap_reader-0.1.0.tar.gz.

File metadata

  • Download URL: oasnap_reader-0.1.0.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for oasnap_reader-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7017f316f2cc492d4752572b866a3417e071cda35249539b872120328558cf70
MD5 1411bb1f173a91fe8982a7692eb08a39
BLAKE2b-256 2ca56ad5d334ce94b838ee67e1cda018696936109f5455e42047b793b1b09029

See more details on using hashes here.

File details

Details for the file oasnap_reader-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: oasnap_reader-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for oasnap_reader-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 999a4b81409c4d92adf06f270b9b286ce25afab841d50d32a0d5703b1dd394bb
MD5 e120c65e2f07cad70833aba1a09b13ef
BLAKE2b-256 3f5c33397f43a9be1c05529a02b393c2a4caa393a4c747f88ad1d639d31fa76f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page