Skip to main content

A Pydantic-powered Python client for the OpenAlex API

Project description

PyOpenAlex

A Pydantic-powered Python client for the OpenAlex API.

OpenAlex is an open catalog of the global research system: 270M+ scholarly works, 90M+ authors, and 100K+ sources. PyOpenAlex gives you typed access to all of it with an API that follows the patterns of FastAPI and Pydantic.

from pyopenalex import OpenAlex, gt

with OpenAlex() as client:
    for work in client.works.filter(cited_by_count=gt(1000), publication_year=2024).limit(10):
        print(f"{work.title} ({work.cited_by_count} citations)")

Installation

pip install pyopenalex

Requires Python 3.13+.

Quick Start

from pyopenalex import OpenAlex

client = OpenAlex(api_key="your-key")  # or set OPENALEX_API_KEY env var

# Get a single work by ID
work = client.works.get("W2741809807")
print(work.title)
print(work.doi)
print(work.abstract)  # reconstructed from inverted index

# Search for authors
results = client.authors.search("Einstein").per_page(5).get()
for author in results.results:
    print(f"{author.display_name}: {author.works_count} works")

Entities

PyOpenAlex supports all core OpenAlex entity types:

client.works          # Scholarly documents (articles, books, datasets)
client.authors        # Researcher profiles
client.sources        # Journals, repositories, conferences
client.institutions   # Universities, research organizations
client.topics         # Subject classifications
client.keywords       # Extracted keywords
client.publishers     # Publishing organizations
client.funders        # Funding agencies

Every entity is a Pydantic model with fully typed fields:

work = client.works.get("W2741809807")

work.title                              # str | None
work.publication_year                   # int | None
work.cited_by_count                     # int | None
work.open_access.is_oa                  # bool
work.open_access.oa_status              # str (gold, green, hybrid, bronze, diamond, closed)
work.authorships[0].author.display_name # str | None
work.authorships[0].institutions        # list[DehydratedInstitution]
work.primary_location.source            # DehydratedSource | None

Looking Up Entities

By OpenAlex ID

work = client.works.get("W2741809807")
author = client.authors.get("A5023888391")

By External ID

Works accept DOIs, authors accept ORCIDs, institutions accept ROR IDs:

work = client.works.get("https://doi.org/10.7717/peerj.4375")
author = client.authors.get("https://orcid.org/0000-0001-6187-6610")
institution = client.institutions.get("https://ror.org/0161xgx34")

Batch Lookup

Fetch up to 100 entities at once:

works = client.works.get(["W2741809807", "W2100837269", "W1775749144"])

Random Entity

work = client.works.random()

Filtering

Chain .filter() calls to narrow results. Multiple filters combine with AND:

results = (
    client.works
    .filter(publication_year=2024, is_oa=True)
    .sort("cited_by_count", desc=True)
    .per_page(100)
    .get()
)

Filter Expressions

PyOpenAlex provides expression functions for building filters, similar to how FastAPI uses Query(), Path(), and Body():

from pyopenalex import gt, lt, ne, or_, between

# Greater than / less than
client.works.filter(cited_by_count=gt(100))
client.works.filter(publication_year=lt(2020))

# Not equal
client.works.filter(type=ne("paratext"))

# OR (up to 100 values)
client.works.filter(doi=or_(
    "https://doi.org/10.7717/peerj.4375",
    "https://doi.org/10.1038/nature12373",
))

# Range
client.works.filter(publication_year=between(2020, 2024))

Nested Filters

Use dicts for dot-notation filter paths. PyOpenAlex flattens them automatically:

# These are equivalent:
client.works.filter(authorships={"institutions": {"id": "I136199984"}})
client.works.filter_raw("authorships.institutions.id:I136199984")

Raw Filters

For full control, pass the filter string directly:

client.works.filter_raw("publication_year:2024,is_oa:true,cited_by_count:>100")

Searching

Full-Text Search

results = client.works.search("machine learning").get()

Field-Specific Search

results = client.works.search_filter(title="neural networks").get()

Search and filters can be combined:

results = (
    client.works
    .search("CRISPR")
    .filter(publication_year=2024, is_oa=True)
    .sort("cited_by_count", desc=True)
    .get()
)

Sorting

# Ascending (default)
client.works.sort("publication_date")

# Descending
client.works.sort("cited_by_count", desc=True)

Field Selection

Request only the fields you need to reduce response size:

results = client.works.select("id", "title", "doi", "cited_by_count").get()

Pagination

Page-Based

page1 = client.works.filter(publication_year=2024).page(1).per_page(100).get()
page2 = client.works.filter(publication_year=2024).page(2).per_page(100).get()

Cursor-Based (Automatic)

Iterate over any query and PyOpenAlex handles cursor pagination automatically:

for work in client.works.filter(publication_year=2024, is_oa=True):
    print(work.title)

Use .limit() to cap the total number of results:

for work in client.works.filter(publication_year=2024).limit(500):
    process(work)

Counting

Get the total number of matching results without fetching them:

count = client.works.filter(publication_year=2024, is_oa=True).count()

Grouping

Aggregate results by a field:

response = client.works.filter(publication_year=2024).group_by("type").get()
for group in response.group_by:
    print(f"{group.key_display_name}: {group.count}")

Sampling

Get a random sample of results:

results = client.works.sample(100, seed=42).get()

Autocomplete

Fast typeahead search returning up to 10 results:

results = client.institutions.autocomplete("harvard")
for r in results:
    print(f"{r.display_name} ({r.works_count} works)")

Query Reuse

The query builder is immutable. Each method returns a new instance, so you can safely branch from a base query:

base = client.works.filter(publication_year=2024, is_oa=True)

most_cited = base.sort("cited_by_count", desc=True).per_page(10).get()
recent = base.sort("publication_date", desc=True).per_page(10).get()
count = base.count()

Response Objects

List queries return a ListResponse with three parts:

response = client.works.search("CRISPR").get()

response.meta        # Meta: count, page, per_page, cost_usd, ...
response.results     # list[Work]: the entities
response.group_by    # list[GroupByResult]: populated when using group_by

Configuration

API Key

Set your API key in any of these ways (in order of precedence):

# 1. Constructor argument
client = OpenAlex(api_key="your-key")

# 2. Environment variable
# export OPENALEX_API_KEY=your-key
client = OpenAlex()

Get a free API key at openalex.org/settings/api.

Other Settings

client = OpenAlex(
    api_key="your-key",
    base_url="https://api.openalex.org",  # default
    timeout=30.0,                          # request timeout in seconds
    max_retries=3,                         # retries on 429/5xx errors
)

All settings can be set via environment variables with the OPENALEX_ prefix:

export OPENALEX_API_KEY=your-key
export OPENALEX_TIMEOUT=60
export OPENALEX_MAX_RETRIES=5

Context Manager

The client can be used as a context manager to ensure the HTTP connection is closed:

with OpenAlex() as client:
    work = client.works.get("W2741809807")

Error Handling

PyOpenAlex raises typed exceptions:

from pyopenalex.exceptions import NotFoundError, RateLimitError, APIError

try:
    work = client.works.get("W0000000000")
except NotFoundError:
    print("Work not found")
except RateLimitError:
    print("Daily rate limit exceeded")
except APIError as e:
    print(f"HTTP {e.status_code}: {e}")

Retries with exponential backoff are automatic for 429 (rate limit) and 5xx (server error) responses.

Abstract Reconstruction

OpenAlex stores abstracts as inverted indexes. PyOpenAlex reconstructs them for you:

work = client.works.get("W2741809807")
print(work.abstract)  # full abstract text, or None if unavailable

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyopenalex-0.1.1.tar.gz (21.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyopenalex-0.1.1-py3-none-any.whl (18.0 kB view details)

Uploaded Python 3

File details

Details for the file pyopenalex-0.1.1.tar.gz.

File metadata

  • Download URL: pyopenalex-0.1.1.tar.gz
  • Upload date:
  • Size: 21.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyopenalex-0.1.1.tar.gz
Algorithm Hash digest
SHA256 3ec2e0d03e1567f637555690706e07bd41870327c9e2c55f2d46dca2a72cacb1
MD5 7ab869a701f6f43a968e00d730f9e41a
BLAKE2b-256 08efcdc22a7a1bd7b71dadba20e58bc26389447736cea45a17b35ce4102dbb3f

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyopenalex-0.1.1.tar.gz:

Publisher: publish.yml on nthomsencph/pyopenalex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyopenalex-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pyopenalex-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 18.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyopenalex-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6af2a8764d6ba15764d2ef5e5fd1d836496b1899704effa7b31cb5284f6cdf0c
MD5 65b3510feff0c2b3b0dba5d1627db524
BLAKE2b-256 bcf8d606a22b3a10ffdccc2797a21f10ffec8812bd24cde7708406b44177c3ee

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyopenalex-0.1.1-py3-none-any.whl:

Publisher: publish.yml on nthomsencph/pyopenalex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page