A Pydantic-powered Python client for the OpenAlex API
Project description
A Pydantic-powered Python client for the OpenAlex API.
OpenAlex is an open catalog of the global research system: 270M+ scholarly works, 90M+ authors, and 100K+ sources. PyOpenAlex gives you typed access to all of it with an API that follows the patterns of FastAPI and Pydantic.
from pyopenalex import OpenAlex, gt
with OpenAlex() as client:
for work in client.works.filter(cited_by_count=gt(1000), publication_year=2024).limit(10):
print(f"{work.title} ({work.cited_by_count} citations)")
Installation
pip install pyopenalex
Requires Python 3.13+.
Quick Start
from pyopenalex import OpenAlex
client = OpenAlex(api_key="your-key") # or set OPENALEX_API_KEY env var
# Get a single work by ID
work = client.works.get("W2741809807")
print(work.title)
print(work.doi)
print(work.abstract) # reconstructed from inverted index
# Search for authors
results = client.authors.search("Einstein").per_page(5).get()
for author in results.results:
print(f"{author.display_name}: {author.works_count} works")
Entities
PyOpenAlex supports all core OpenAlex entity types:
client.works # Scholarly documents (articles, books, datasets)
client.authors # Researcher profiles
client.sources # Journals, repositories, conferences
client.institutions # Universities, research organizations
client.topics # Subject classifications
client.keywords # Extracted keywords
client.publishers # Publishing organizations
client.funders # Funding agencies
Every entity is a Pydantic model with fully typed fields:
work = client.works.get("W2741809807")
work.title # str | None
work.publication_year # int | None
work.cited_by_count # int | None
work.open_access.is_oa # bool
work.open_access.oa_status # str (gold, green, hybrid, bronze, diamond, closed)
work.authorships[0].author.display_name # str | None
work.authorships[0].institutions # list[DehydratedInstitution]
work.primary_location.source # DehydratedSource | None
Looking Up Entities
By OpenAlex ID
work = client.works.get("W2741809807")
author = client.authors.get("A5023888391")
By External ID
Works accept DOIs, authors accept ORCIDs, institutions accept ROR IDs:
work = client.works.get("https://doi.org/10.7717/peerj.4375")
author = client.authors.get("https://orcid.org/0000-0001-6187-6610")
institution = client.institutions.get("https://ror.org/0161xgx34")
Batch Lookup
Fetch up to 100 entities at once:
works = client.works.get(["W2741809807", "W2100837269", "W1775749144"])
Random Entity
work = client.works.random()
Filtering
Chain .filter() calls to narrow results. Multiple filters combine with AND:
results = (
client.works
.filter(publication_year=2024, is_oa=True)
.sort("cited_by_count", desc=True)
.per_page(100)
.get()
)
Filter Expressions
PyOpenAlex provides expression functions for building filters, similar to how FastAPI uses Query(), Path(), and Body():
from pyopenalex import gt, lt, ne, or_, between
# Greater than / less than
client.works.filter(cited_by_count=gt(100))
client.works.filter(publication_year=lt(2020))
# Not equal
client.works.filter(type=ne("paratext"))
# OR (up to 100 values)
client.works.filter(doi=or_(
"https://doi.org/10.7717/peerj.4375",
"https://doi.org/10.1038/nature12373",
))
# Range
client.works.filter(publication_year=between(2020, 2024))
Nested Filters
Use dicts for dot-notation filter paths. PyOpenAlex flattens them automatically:
# These are equivalent:
client.works.filter(authorships={"institutions": {"id": "I136199984"}})
client.works.filter_raw("authorships.institutions.id:I136199984")
Raw Filters
For full control, pass the filter string directly:
client.works.filter_raw("publication_year:2024,is_oa:true,cited_by_count:>100")
Searching
Full-Text Search
results = client.works.search("machine learning").get()
Field-Specific Search
results = client.works.search_filter(title="neural networks").get()
Search and filters can be combined:
results = (
client.works
.search("CRISPR")
.filter(publication_year=2024, is_oa=True)
.sort("cited_by_count", desc=True)
.get()
)
Sorting
# Ascending (default)
client.works.sort("publication_date")
# Descending
client.works.sort("cited_by_count", desc=True)
Field Selection
Request only the fields you need to reduce response size:
results = client.works.select("id", "title", "doi", "cited_by_count").get()
Pagination
Page-Based
page1 = client.works.filter(publication_year=2024).page(1).per_page(100).get()
page2 = client.works.filter(publication_year=2024).page(2).per_page(100).get()
Cursor-Based (Automatic)
Iterate over any query and PyOpenAlex handles cursor pagination automatically:
for work in client.works.filter(publication_year=2024, is_oa=True):
print(work.title)
Use .limit() to cap the total number of results:
for work in client.works.filter(publication_year=2024).limit(500):
process(work)
Counting
Get the total number of matching results without fetching them:
count = client.works.filter(publication_year=2024, is_oa=True).count()
Grouping
Aggregate results by a field:
response = client.works.filter(publication_year=2024).group_by("type").get()
for group in response.group_by:
print(f"{group.key_display_name}: {group.count}")
Sampling
Get a random sample of results:
results = client.works.sample(100, seed=42).get()
Autocomplete
Fast typeahead search returning up to 10 results:
results = client.institutions.autocomplete("harvard")
for r in results:
print(f"{r.display_name} ({r.works_count} works)")
Query Reuse
The query builder is immutable. Each method returns a new instance, so you can safely branch from a base query:
base = client.works.filter(publication_year=2024, is_oa=True)
most_cited = base.sort("cited_by_count", desc=True).per_page(10).get()
recent = base.sort("publication_date", desc=True).per_page(10).get()
count = base.count()
Response Objects
List queries return a ListResponse with three parts:
response = client.works.search("CRISPR").get()
response.meta # Meta: count, page, per_page, cost_usd, ...
response.results # list[Work]: the entities
response.group_by # list[GroupByResult]: populated when using group_by
Configuration
API Key
Set your API key in any of these ways (in order of precedence):
# 1. Constructor argument
client = OpenAlex(api_key="your-key")
# 2. Environment variable
# export OPENALEX_API_KEY=your-key
client = OpenAlex()
Get a free API key at openalex.org/settings/api.
Other Settings
client = OpenAlex(
api_key="your-key",
base_url="https://api.openalex.org", # default
timeout=30.0, # request timeout in seconds
max_retries=3, # retries on 429/5xx errors
)
All settings can be set via environment variables with the OPENALEX_ prefix:
export OPENALEX_API_KEY=your-key
export OPENALEX_TIMEOUT=60
export OPENALEX_MAX_RETRIES=5
Context Manager
The client can be used as a context manager to ensure the HTTP connection is closed:
with OpenAlex() as client:
work = client.works.get("W2741809807")
Error Handling
PyOpenAlex raises typed exceptions:
from pyopenalex.exceptions import NotFoundError, RateLimitError, APIError
try:
work = client.works.get("W0000000000")
except NotFoundError:
print("Work not found")
except RateLimitError:
print("Daily rate limit exceeded")
except APIError as e:
print(f"HTTP {e.status_code}: {e}")
Retries with exponential backoff are automatic for 429 (rate limit) and 5xx (server error) responses.
Abstract Reconstruction
OpenAlex stores abstracts as inverted indexes. PyOpenAlex reconstructs them for you:
work = client.works.get("W2741809807")
print(work.abstract) # full abstract text, or None if unavailable
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyopenalex-0.1.0.tar.gz.
File metadata
- Download URL: pyopenalex-0.1.0.tar.gz
- Upload date:
- Size: 11.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7ea4e7056ccaaf8ad117b4403cd6fe7fa1942a1fbd1235b75e3cad365191a1d
|
|
| MD5 |
21987ee4ebdfb267ae01802f07f2f363
|
|
| BLAKE2b-256 |
9f0d8d436d872c3d9811f53b32bb3d6f1ad033f36a32f0f06d4a6c12cc66cf9d
|
File details
Details for the file pyopenalex-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pyopenalex-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e19fd401bcc611bd91fd01868da173507d0bf2f9dda2d5e7092d796e79dfd37a
|
|
| MD5 |
1a44496b05acdc47154a27cc9cfad360
|
|
| BLAKE2b-256 |
36fe8d45ed4a25b03d39398bf2c4558562bbbee61edd36c954b6039547a3dd9c
|