Skip to main content

One downloader for many scientific data and code repositories!

Project description

PyAlex - a Python wrapper for OpenAlex

PyAlex

PyPI

PyAlex is a Python library for OpenAlex. OpenAlex is an index of hundreds of millions of interconnected scholarly papers, authors, institutions, and more. OpenAlex offers a robust, open, and free REST API to extract, aggregate, or search scholarly data. PyAlex is a lightweight and thin Python interface to this API. PyAlex tries to stay as close as possible to the design of the original service.

The following features of OpenAlex are currently supported by PyAlex:

  • Get single entities
  • Filter entities
  • Search entities
  • Group entities
  • Search filters
  • Pagination
  • Autocomplete endpoint
  • N-grams

We aim to cover the entire API, and we are looking for help. We are welcoming Pull Requests.

Key features

  • Pipe operations - PyAlex can handle multiple operations in a sequence. This allows the developer to write understandable queries. For examples, see code snippets.
  • Plaintext abstracts - OpenAlex doesn't include plaintext abstracts due to legal constraints. PyAlex converts the inverted abstracts into plaintext abstracts on the fly.
  • Permissive license - OpenAlex data is CC0 licensed :raised_hands:. PyAlex is published under the MIT license.

Installation

PyAlex requires Python 3.6 or later.

pip install pyalex

Getting started

PyAlex offers support for all Entity Objects (Works, Authors, Venues, Institutions, Concepts).

from pyalex import Works, Authors, Venues, Institutions, Concepts

The polite pool

The polite pool has much faster and more consistent response times. To get into the polite pool, you set your email:

import pyalex

pyalex.config.email = "mail@example.com"

Get single entity

Get a single Work, Author, Venue, Institution or Concept from OpenAlex by the OpenAlex ID, or by DOI or ROR.

Works()["W2741809807"]

# same as
Works()["https://doi.org/10.7717/peerj.4375"]

The result is a Work object, which is very similar to a dictionary. Find the avialable fields with .keys().

For example, get the open access status:

Works()["W2741809807"]["open_access"]
{'is_oa': True, 'oa_status': 'gold', 'oa_url': 'https://doi.org/10.7717/peerj.4375'}

The previous works also for Authors, Venues, Institutions and Concepts

Authors()["A2887243803"]
Authors()["https://orcid.org/0000-0002-4297-0502"]  # same

Get random

Get a random Work, Author, Venue, Institution or Concept.

Works().random()
Authors().random()
Venues().random()
Institutions().random()
Concepts().random()

Get abstract

Only for Works. Request a work from the OpenAlex database:

w = Works()["W3128349626"]

All attributes are available like documented under Works, as well as abstract (only if abstract_inverted_index is not None).

w["abstract"]
'Abstract To help researchers conduct a systematic review or meta-analysis as efficiently and transparently as possible, we designed a tool to accelerate the step of screening titles and abstracts. For many tasks—including but not limited to systematic reviews and meta-analyses—the scientific literature needs to be checked systematically. Scholars and practitioners currently screen thousands of studies by hand to determine which studies to include in their review or meta-analysis. This is error prone and inefficient because of extremely imbalanced data: only a fraction of the screened studies is relevant. The future of systematic reviewing will be an interaction with machine learning algorithms to deal with the enormous increase of available text. We therefore developed an open source machine learning-aided pipeline applying active learning: ASReview. We demonstrate by means of simulation studies that active learning can yield far more efficient reviewing than manual reviewing while providing high quality. Furthermore, we describe the options of the free and open source research software and present the results from user experience tests. We invite the community to contribute to open source projects such as our own that provide measurable and reproducible improvements over current practice.'

Please respect the legal constraints when using this feature.

Get lists of entities

results = Works().get()

For list of enities, you can return the result as well as the metadata. By default, only the results are returned.

results, meta = Concepts().get(return_meta=True)
print(meta)
{'count': 65073, 'db_response_time_ms': 16, 'page': 1, 'per_page': 25}

Filter records

Works().filter(publication_year=2020, is_oa=True).get()

which is identical to:

Works().filter(publication_year=2020).filter(is_oa=True).get()

Nested attribute filters

Some attribute filers are nested and separated with dots by OpenAlex. For example, filter on authorships.institutions.ror.

In case of nested attribute filters, use a dict to built the query.

Works()
  .filter(authorships={"institutions": {"ror": "04pp8hn57"}})
  .get()

Search entities

OpenAlex reference: The search parameter

Works().search("fierce creatures").get()

Search filter

OpenAlex reference: The search filter

Authors().search_filter(display_name="einstein").get()
Works().search_filter(title="cubist").get()

Sort entity lists

OpenAlex reference: Sort entity lists.

Works().sort(cited_by_count="desc").get()

Paging

OpenAlex offers two methods for paging: basic paging and cursor paging. Both methods are supported by PyAlex, although cursor paging seems to be easier to implement and less error-prone.

Basic paging

See limitations of basic paging in the OpenAlex documentation. It's relatively easy to implement basic paging with PyAlex, however it is advised to use the built-in pager based on cursor paging.

Cursor paging

Use paginate() for paging results. By default, paginates argument n_max is set to 10000. Use None to retrieve all results.

from pyalex import Authors

pager = Authors().search_filter(display_name="einstein").paginate(per_page=200)

for page in pager:
    print(len(page))

Get N-grams

OpenAlex reference: Get N-grams.

Works()["W2023271753"].ngrams()

Code snippets

A list of awesome use cases of the OpenAlex dataset.

Cited publications (referenced works)

from pyalex import Works

# the work to extract the referenced works of
w = Works()["W2741809807"]

Works()[w["referenced_works"]]

Dataset publications in the global south

from pyalex import Works

# the work to extract the referenced works of
w = Works() \
  .filter(institutions={"is_global_south":True}) \
  .filter(type="dataset") \
  .group_by("institutions.country_code") \
  .get()

Most cited publications in your organisation

from pyalex import Works

Works() \
  .filter(authorships={"institutions": {"ror": "04pp8hn57"}}) \
  .sort(cited_by_count="desc") \
  .get()

Same, but with only for first authors

from pyalex import Works

Works() \
  .filter(authorships={"institutions": {"ror": "04pp8hn57"},
                       "author_position": "first"}) \
  .sort(cited_by_count="desc") \
  .get()

License

MIT

Contact

Feel free to reach out with questions, remarks, and suggestions. The issue tracker is a good starting point. You can also email me at jonathandebruinos@gmail.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyalex-0.6.tar.gz (36.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyalex-0.6-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file pyalex-0.6.tar.gz.

File metadata

  • Download URL: pyalex-0.6.tar.gz
  • Upload date:
  • Size: 36.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.1

File hashes

Hashes for pyalex-0.6.tar.gz
Algorithm Hash digest
SHA256 e1e46cb17167ece64b151f0adc424d7ff9a9d8d9b545afe5b3a10e4f0647e560
MD5 66324f1e6537df22f2cb6e0263bdbe5e
BLAKE2b-256 b2f504156b4625ecb8d9affc45e0e02305c0a5dd7318de6aa9ee3884cd2da0fb

See more details on using hashes here.

File details

Details for the file pyalex-0.6-py3-none-any.whl.

File metadata

  • Download URL: pyalex-0.6-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.1

File hashes

Hashes for pyalex-0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 c66cdbae9844cb01ca1249ca32e00aa350225e347241e5fd92c98469720b0fef
MD5 957a44571664ee161a0e3883a382ff8b
BLAKE2b-256 9be808f500d297e0b3cf52cf1de6229df68eb854112ebb14458d04d9844eec52

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page