Modern async Python library for the OpenAlex API
Project description
openalexpy
A modern, async-first Python library for the OpenAlex API.
Acknowledgements
This library was inspired by PyAlex by Jonathan de Bruin.
PyAlex pioneered the Python interface to OpenAlex and proved the value of a fluent,
pipe-able query builder for scholarly data. openalexpy builds on those foundations
with native async support, full type safety via Pydantic, and cost-aware API usage.
Features
- Async-first with sync wrappers —
httpx-based,asyncio-native - Fully typed — Pydantic v2 models for all entities with IDE autocomplete
- Cost-aware — Parses
X-RateLimit-*headers, distinguishes credit exhaustion from temporary rate limits, exposescost_usdon every response - Correct API key handling — Uses
api_keyquery parameter as documented by OpenAlex (not undocumented headers) - Immutable query builder — Each method returns a new instance, no shared mutable state bugs
- Two-step content download — Properly handles PDF/TEI redirects preserving rate-limit headers
- Semantic search — First-class support for
search.semanticwith automatic 1 req/s rate limiting
Installation
uv add openalex-py
or
pip install openalex-py
Requires Python 3.10+.
Quick Start
Configure your API key
import openalexpy
openalexpy.config.api_key = "YOUR_API_KEY"
Or set the OPENALEX_API_KEY environment variable.
Async usage (recommended)
import asyncio
import openalexpy
async def main():
# Get a single work
work = await openalexpy.Works().get_by_id("W2741809807")
print(work.title)
print(work.abstract)
# Filter works
results = await openalexpy.Works().filter(
publication_year=2024, is_oa=True
).sort(cited_by_count="desc").get(per_page=10)
for w in results:
print(f"{w.title} ({w.cited_by_count} citations)")
# Semantic search
similar = await openalexpy.Works().similar(
"machine learning for drug discovery"
).filter(publication_year=">2022").get(per_page=50)
# Paginate through all results
pager = openalexpy.Works().filter(
author={"id": "A5023888391"}
).paginate(per_page=100)
async for page in pager:
for work in page:
print(work.id)
asyncio.run(main())
Sync usage
import openalexpy
openalexpy.config.api_key = "YOUR_API_KEY"
# Get a single work
work = openalexpy.WorksSync().get_by_id("W2741809807")
# Filter and get
results = openalexpy.WorksSync().filter(
publication_year=2024
).sort(cited_by_count="desc").get(per_page=10)
# Paginate
pager = openalexpy.WorksSync().filter(
author={"id": "A5023888391"}
).paginate(per_page=100)
for page in pager:
for work in page:
print(work.id)
Supported Entities
| Entity | Async | Sync |
|---|---|---|
| Works | Works() |
WorksSync() |
| Authors | Authors() |
AuthorsSync() |
| Sources | Sources() |
SourcesSync() |
| Institutions | Institutions() |
InstitutionsSync() |
| Topics | Topics() |
TopicsSync() |
| Publishers | Publishers() |
PublishersSync() |
| Funders | Funders() |
FundersSync() |
| Awards | Awards() |
AwardsSync() |
| Keywords | Keywords() |
KeywordsSync() |
| Domains | Domains() |
DomainsSync() |
| Fields | Fields() |
FieldsSync() |
| Subfields | Subfields() |
SubfieldsSync() |
Query Building
All query methods return a new instance — the original query is never mutated.
base = openalexpy.Works()
# Chain filters
q = base.filter(publication_year=2024).filter(is_oa=True)
# base is unchanged
assert "filter" not in base.params
Filter operators
# AND (default)
Works().filter(institutions={"country_code": ["fr", "gb"]})
# OR
Works().filter_or(doi=["10.1234/a", "10.1234/b"])
# NOT
Institutions().filter_not(country_code="us")
# Greater than / Less than
Works().filter_gt(cited_by_count=100)
Works().filter_lt(publication_year=2020)
Semantic search
# Basic semantic search (capped at 50 results per request)
results = await Works().similar("climate change impacts").get()
# Combined with filters
results = await (
Works()
.similar("quantum computing applications")
.filter(publication_year=">2022", is_oa=True)
.get(per_page=50)
)
Pagination
# Cursor pagination (default, for deep pagination)
pager = Works().filter(publication_year=2024).paginate(per_page=100, n_max=5000)
async for page in pager:
print(len(page))
# Page-based pagination (limited to 10,000 results)
pager = Works().search("dna").paginate(method="page", per_page=100)
Content download (PDF/TEI)
work = await Works().get_by_id("W4412002745")
# Get PDF bytes
pdf_bytes = await work._pdf.get()
# Download to file
await work._pdf.download("paper.pdf")
# TEI XML
tei_bytes = await work._tei.get()
Cost Monitoring
Every API response includes cost information:
response = await Works().filter(publication_year=2024).get(
per_page=100, return_meta=True
)
print(f"Cost: ${response.meta.cost_usd}")
print(f"Total results: {response.meta.count}")
# Check rate limit status
client = openalexpy.client.AsyncOpenAlexClient()
status = await client.get_rate_limit_status()
print(status)
Error Handling
from openalexpy import QueryError, RateLimitError, CreditsExhaustedError
try:
results = await Works().filter(bad_filter=True).get()
except QueryError as e:
print(f"Bad query: {e}")
except CreditsExhaustedError as e:
print(f"Daily credits exhausted. Resets at: {e.reset_at}")
except RateLimitError as e:
print(f"Temporarily rate limited. Retry after: {e.retry_after}s")
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file openalex_py-0.1.0.tar.gz.
File metadata
- Download URL: openalex_py-0.1.0.tar.gz
- Upload date:
- Size: 14.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5fb95ecd1d5b72c33843f81934f86a5a78ca6542072d0f6960e9e1635e0f0bf1
|
|
| MD5 |
64c33e4bb5414dfabacbc3155cc729cb
|
|
| BLAKE2b-256 |
b1609344c1df432df01437a3e9f60757ee6fe4bac62796c3c47d710d6b98e8ba
|
Provenance
The following attestation bundles were made for openalex_py-0.1.0.tar.gz:
Publisher:
python-publish.yml on paluigi/openalex-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
openalex_py-0.1.0.tar.gz -
Subject digest:
5fb95ecd1d5b72c33843f81934f86a5a78ca6542072d0f6960e9e1635e0f0bf1 - Sigstore transparency entry: 1364985870
- Sigstore integration time:
-
Permalink:
paluigi/openalex-py@7724a20d0601c4936bd31448d78906fdca775484 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/paluigi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@7724a20d0601c4936bd31448d78906fdca775484 -
Trigger Event:
release
-
Statement type:
File details
Details for the file openalex_py-0.1.0-py3-none-any.whl.
File metadata
- Download URL: openalex_py-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73581a6e66d8efed3839c2ace4ed58b6673ca3dd7914384767afc6296061e220
|
|
| MD5 |
354db9c678b3f5501ee146d0c7fb120e
|
|
| BLAKE2b-256 |
e5375138fbbba895caef6a3b8bce376984ede43e3fddda21809f9b87ec54db98
|
Provenance
The following attestation bundles were made for openalex_py-0.1.0-py3-none-any.whl:
Publisher:
python-publish.yml on paluigi/openalex-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
openalex_py-0.1.0-py3-none-any.whl -
Subject digest:
73581a6e66d8efed3839c2ace4ed58b6673ca3dd7914384767afc6296061e220 - Sigstore transparency entry: 1364986007
- Sigstore integration time:
-
Permalink:
paluigi/openalex-py@7724a20d0601c4936bd31448d78906fdca775484 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/paluigi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@7724a20d0601c4936bd31448d78906fdca775484 -
Trigger Event:
release
-
Statement type: