Skip to main content

Modern async Python library for the OpenAlex API

Project description

openalexpy

A modern, async-first Python library for the OpenAlex API.

Acknowledgements

This library was inspired by PyAlex by Jonathan de Bruin. PyAlex pioneered the Python interface to OpenAlex and proved the value of a fluent, pipe-able query builder for scholarly data. openalexpy builds on those foundations with native async support, full type safety via Pydantic, and cost-aware API usage.

Features

  • Async-first with sync wrappers — httpx-based, asyncio-native
  • Fully typed — Pydantic v2 models for all entities with IDE autocomplete
  • Cost-aware — Parses X-RateLimit-* headers, distinguishes credit exhaustion from temporary rate limits, exposes cost_usd on every response
  • Correct API key handling — Uses api_key query parameter as documented by OpenAlex (not undocumented headers)
  • Immutable query builder — Each method returns a new instance, no shared mutable state bugs
  • Two-step content download — Properly handles PDF/TEI redirects preserving rate-limit headers
  • Semantic search — First-class support for search.semantic with automatic 1 req/s rate limiting

Installation

uv add openalex-py

or

pip install openalex-py

Requires Python 3.10+.

Quick Start

Configure your API key

import openalexpy

openalexpy.config.api_key = "YOUR_API_KEY"

Or set the OPENALEX_API_KEY environment variable.

Async usage (recommended)

import asyncio
import openalexpy

async def main():
    # Get a single work
    work = await openalexpy.Works().get_by_id("W2741809807")
    print(work.title)
    print(work.abstract)

    # Filter works
    results = await openalexpy.Works().filter(
        publication_year=2024, is_oa=True
    ).sort(cited_by_count="desc").get(per_page=10)

    for w in results:
        print(f"{w.title} ({w.cited_by_count} citations)")

    # Semantic search
    similar = await openalexpy.Works().similar(
        "machine learning for drug discovery"
    ).filter(publication_year=">2022").get(per_page=50)

    # Paginate through all results
    pager = openalexpy.Works().filter(
        author={"id": "A5023888391"}
    ).paginate(per_page=100)

    async for page in pager:
        for work in page:
            print(work.id)

asyncio.run(main())

Sync usage

import openalexpy

openalexpy.config.api_key = "YOUR_API_KEY"

# Get a single work
work = openalexpy.WorksSync().get_by_id("W2741809807")

# Filter and get
results = openalexpy.WorksSync().filter(
    publication_year=2024
).sort(cited_by_count="desc").get(per_page=10)

# Paginate
pager = openalexpy.WorksSync().filter(
    author={"id": "A5023888391"}
).paginate(per_page=100)

for page in pager:
    for work in page:
        print(work.id)

Supported Entities

Entity Async Sync
Works Works() WorksSync()
Authors Authors() AuthorsSync()
Sources Sources() SourcesSync()
Institutions Institutions() InstitutionsSync()
Topics Topics() TopicsSync()
Publishers Publishers() PublishersSync()
Funders Funders() FundersSync()
Awards Awards() AwardsSync()
Keywords Keywords() KeywordsSync()
Domains Domains() DomainsSync()
Fields Fields() FieldsSync()
Subfields Subfields() SubfieldsSync()

Query Building

All query methods return a new instance — the original query is never mutated.

base = openalexpy.Works()

# Chain filters
q = base.filter(publication_year=2024).filter(is_oa=True)

# base is unchanged
assert "filter" not in base.params

Filter operators

# AND (default)
Works().filter(institutions={"country_code": ["fr", "gb"]})

# OR
Works().filter_or(doi=["10.1234/a", "10.1234/b"])

# NOT
Institutions().filter_not(country_code="us")

# Greater than / Less than
Works().filter_gt(cited_by_count=100)
Works().filter_lt(publication_year=2020)

Semantic search

# Basic semantic search (capped at 50 results per request)
results = await Works().similar("climate change impacts").get()

# Combined with filters
results = await (
    Works()
    .similar("quantum computing applications")
    .filter(publication_year=">2022", is_oa=True)
    .get(per_page=50)
)

Pagination

# Cursor pagination (default, for deep pagination)
pager = Works().filter(publication_year=2024).paginate(per_page=100, n_max=5000)

async for page in pager:
    print(len(page))

# Page-based pagination (limited to 10,000 results)
pager = Works().search("dna").paginate(method="page", per_page=100)

Content download (PDF/TEI)

work = await Works().get_by_id("W4412002745")

# Get PDF bytes
pdf_bytes = await work._pdf.get()

# Download to file
await work._pdf.download("paper.pdf")

# TEI XML
tei_bytes = await work._tei.get()

Cost Monitoring

Every API response includes cost information:

response = await Works().filter(publication_year=2024).get(
    per_page=100, return_meta=True
)
print(f"Cost: ${response.meta.cost_usd}")
print(f"Total results: {response.meta.count}")

# Check rate limit status
client = openalexpy.client.AsyncOpenAlexClient()
status = await client.get_rate_limit_status()
print(status)

Error Handling

from openalexpy import QueryError, RateLimitError, CreditsExhaustedError

try:
    results = await Works().filter(bad_filter=True).get()
except QueryError as e:
    print(f"Bad query: {e}")
except CreditsExhaustedError as e:
    print(f"Daily credits exhausted. Resets at: {e.reset_at}")
except RateLimitError as e:
    print(f"Temporarily rate limited. Retry after: {e.retry_after}s")

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openalex_py-0.1.0.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openalex_py-0.1.0-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file openalex_py-0.1.0.tar.gz.

File metadata

  • Download URL: openalex_py-0.1.0.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for openalex_py-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5fb95ecd1d5b72c33843f81934f86a5a78ca6542072d0f6960e9e1635e0f0bf1
MD5 64c33e4bb5414dfabacbc3155cc729cb
BLAKE2b-256 b1609344c1df432df01437a3e9f60757ee6fe4bac62796c3c47d710d6b98e8ba

See more details on using hashes here.

Provenance

The following attestation bundles were made for openalex_py-0.1.0.tar.gz:

Publisher: python-publish.yml on paluigi/openalex-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file openalex_py-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: openalex_py-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for openalex_py-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 73581a6e66d8efed3839c2ace4ed58b6673ca3dd7914384767afc6296061e220
MD5 354db9c678b3f5501ee146d0c7fb120e
BLAKE2b-256 e5375138fbbba895caef6a3b8bce376984ede43e3fddda21809f9b87ec54db98

See more details on using hashes here.

Provenance

The following attestation bundles were made for openalex_py-0.1.0-py3-none-any.whl:

Publisher: python-publish.yml on paluigi/openalex-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page