Simple access to Google Scholar authors and citations

These details have not been verified by PyPI

Project links

Homepage

Project description

scholarly2

scholarly2 is a fork of scholarly, maintained independently and strictly for academic and nonprofit purposes. It retrieves author and publication data from Google Scholar and returns plain Python dictionaries. The current public workflow is:

author profiles by Scholar ID with search_author_id(...)
publication lookup with search_single_pub(...)
publication search iterators with search_pubs(...)
citation traversal with citedby(...) and search_citedby(...)
BibTeX export with bibtex(...)
journal ranking and mandate CSV endpoints
proxy configuration, including automatic .env.socks5 loading and explicit load_socks5_proxy_file(path)

Google Scholar behavior changes over time, so exact ranking, citation counts, snippets, and query-token URLs can vary between runs. The parsed result examples below are representative outputs from the current code.

Installation

Install the latest release from PyPI:

pip3 install scholarly2

Install from GitHub:

pip3 install -U git+https://github.com/ma-ji/scholarly2.git

scholarly2 follows Semantic Versioning.

Optional dependencies

Tor support is deprecated since v1.5 and is not actively tested or supported. If you still want it:

pip3 install scholarly2[tor]

For zsh, quote the extra:

pip3 install scholarly2'[tor]'

Quick Start

from itertools import islice
from scholarly2 import scholarly

# Best-match publication lookup.
pub = scholarly.search_single_pub("10.1007/s11266-018-00057-5")
print(pub["bib"]["title"])

# Publication search returns an iterator, not a list.
results = list(islice(scholarly.search_pubs("machine learning"), 3))
for item in results:
    print(item["gsrank"], item["bib"]["title"])

# Author profile lookup is ID-first.
author = scholarly.search_author_id("Smr99uEAAAAJ")
print(author["name"], author["citedby"])

What You Get Back

All main APIs return plain dictionaries.

Common publication fields:

container_type
source
bib
filled
gsrank
pub_url
author_id
num_citations
citedby_url
url_related_articles
url_scholarbib
eprint_url

Common author fields:

container_type
scholar_id
source
name
affiliation
interests
email_domain
homepage
citedby
filled

filled is important:

filled: False means the object only contains the fields parsed from the current search result or profile page.
filled: True means scholarly.fill(...) fetched and merged extra metadata.

Parsed Result Examples

`search_single_pub(...)`

Best-match publication lookup is useful for exact titles and DOIs.

from scholarly2 import scholarly

pub = scholarly.search_single_pub("10.1007/s11266-018-00057-5")

Representative parsed result:

{'container_type': 'Publication',
 'source': <PublicationSource.PUBLICATION_SEARCH_SNIPPET: 'PUBLICATION_SEARCH_SNIPPET'>,
 'bib': {'title': 'A century of nonprofit studies: Scaling the knowledge of the field',
         'author': ['J Ma', 'S Konrath'],
         'pub_year': '2018',
         'venue': 'VOLUNTAS: International Journal of Voluntary and ...',
         'abstract': 'This empirical study examines knowledge production between 1925 and 2015 in nonprofit and philanthropic studies from quantitative and thematic perspectives. Quantitative results suggest that scholars in this field have been actively generating a considerable amount of literature and a solid intellectual base for developing this field toward a new discipline. Thematic analyses suggest that knowledge production in this field is also growing in cohesion-several main themes have been formed and actively advanced since 1980s, and the study of volunteering can be identified as a unique core theme of this field. The lack of geographic and cultural diversity is a critical challenge for advancing nonprofit studies. New paradigms are needed for developing this research field and mitigating the tension between academia and practice. Methodological and pedagogical implications, limitations, and future studies are discussed.'},
 'filled': False,
 'gsrank': 1,
 'pub_url': 'https://www.cambridge.org/core/journals/voluntas/article/century-of-nonprofit-studies-scaling-the-knowledge-of-the-field/...',
 'author_id': ['iVGd04UAAAAJ', '-bDW1IwAAAAJ'],
 'url_scholarbib': '/scholar?hl=en&q=info:veUUt9BplfoJ:scholar.google.com/&output=cite&scirp=0&hl=en',
 'url_add_sclib': '/citations?...&info=veUUt9BplfoJ&json=',
 'num_citations': 124,
 'citedby_url': '/scholar?cites=18056454626157585853&as_sdt=2005&sciodt=0,5&hl=en',
 'url_related_articles': '/scholar?q=related:veUUt9BplfoJ:scholar.google.com/&scioq=10.1007/s11266-018-00057-5&hl=en&as_sdt=0,5',
 'eprint_url': 'https://www.cambridge.org/core/services/aop-cambridge-core/content/view/...pdf'}

Notes:

search_single_pub(...) returns one best-match publication.
When Scholar exposes the expanded Show more abstract markup, scholarly2 prefers that full abstract.
For exact DOI or exact-title lookups, this path often returns richer abstracts than a broad search page.

`search_pubs(...)`

search_pubs(...) returns an iterator over search results. next(...) gives only the first result. Use itertools.islice or a loop if you want more than one.

from itertools import islice
from scholarly2 import scholarly

results = list(islice(scholarly.search_pubs("machine learning"), 3))

Notes:

search_pubs(...) returns whatever the live Scholar result page exposes for each row.
If Scholar serves the expanded abstract markup for a result row, scholarly2 returns the full abstract.
If Scholar only serves the short snippet, scholarly2 returns the snippet.

`fill(...)` on a publication

Use fill(...) when you want additional publication metadata after the initial search result.

from scholarly2 import scholarly

pub = scholarly.search_single_pub("10.1007/s11266-018-00057-5")
filled_pub = scholarly.fill(pub)

fill(...) is where publication objects usually gain fields such as publisher, journal, pages, volume, number, pub_type, and bib_id.

`search_author_id(...)`

Anonymous Google Scholar author-name discovery is not part of the current public workflow. Start from a stable Scholar profile ID.

from scholarly2 import scholarly

author = scholarly.search_author_id("Smr99uEAAAAJ")

You can then fetch more sections:

author = scholarly.fill(author, sections=['basics', 'indices', 'counts', 'publications'])

Search Semantics

`search_single_pub(...)` vs `search_pubs(...)`

search_single_pub(query) returns one best-match result.
search_pubs(query) returns an iterator over search result rows.
next(scholarly.search_pubs(...)) returns only the first result.
Use itertools.islice(...) or a loop to consume more results.

`filled`

filled: False means initial parsed result.
filled: True means additional metadata was fetched.
Authors use a list of filled sections, such as ['basics'] or ['basics', 'indices', 'counts'].

Finding Author IDs

If you have a Scholar profile URL like:

https://scholar.google.com/citations?user=4bahYMkAAAAJ&hl=en

Use the user parameter value with search_author_id(...).

You can also collect author IDs from publication results:

from scholarly2 import scholarly

pub = scholarly.search_single_pub("Creating correct blur and its effect on accommodation")
print(pub["author_id"])
# ['4bahYMkAAAAJ', '3xJXtlwAAAAJ', 'Smr99uEAAAAJ']

Citations and BibTeX

Get citations for a publication:

from itertools import islice
from scholarly2 import scholarly

pub = scholarly.search_single_pub("10.1007/s11266-018-00057-5")
first_citations = list(islice(scholarly.citedby(pub), 3))

Export BibTeX:

from scholarly2 import scholarly

pub = scholarly.search_single_pub("10.1007/s11266-018-00057-5")
print(scholarly.bibtex(pub))

Proxies

Google Scholar rate-limits aggressively. If you make enough requests, you should expect blocking and captcha pages. Use proxies for anything non-trivial.

There are many proxy providers available, I often use IPRoyal (disclaimer: this is a referral link). You are welcome to use your own, but make sure you choose Residential Proxies (may named differently depending on provider).

For simplicity, only SOCKS5 workflows are recommended. The legacy methods ScraperAPI(), Luminati(), FreeProxies(), SingleProxy(), Tor_External(), and Tor_Internal() remain for compatibility but are deprecated and will be removed in future releases.

Automatic `.env.socks5` loading

If a .env.socks5 file exists in your working directory, scholarly2 loads it automatically at import time. Put one proxy per line in:

USER:PASS@HOST:PORT

Example:

user1:password1@127.0.0.1:1080
user2:password2@proxy.example.com:2080

See .env.socks5.example for the expected format.

Direct SOCKS5 configuration

Use ProxyGenerator.Socks5Proxies(...) when you want to configure the proxy pool in code:

from scholarly2 import ProxyGenerator, scholarly

pg = ProxyGenerator()
pg.Socks5Proxies([
    "user1:password1@127.0.0.1:1080",
    "user2:password2@proxy.example.com:2080",
])
scholarly.use_proxy(pg)

pub = scholarly.search_single_pub("10.1007/s11266-018-00057-5")

If you pass only one proxy generator to scholarly.use_proxy(pg), that same SOCKS5 pool is reused for all requests.

Explicit file loading

Use load_socks5_proxy_file(path) to load a proxy file from any location at runtime:

from scholarly2 import scholarly

ok = scholarly.load_socks5_proxy_file("/path/to/my.env.socks5")
if ok:
    print("Proxies loaded")

This is useful when your proxy file lives outside the working directory or has a non-standard name. The file format is the same one-proxy-per-line format as .env.socks5.

Deprecated legacy proxy methods

ProxyGenerator.ScraperAPI(), Luminati(), FreeProxies(), SingleProxy(), Tor_External(), and Tor_Internal() are deprecated compatibility paths. Existing code can still call them, but new setups should use .env.socks5, Socks5Proxies(...), Socks5ProxyFile(...), or load_socks5_proxy_file(path).

Availability Notes

Generally usable anonymously:

search_author_id
search_pubs
search_single_pub
search_citedby
fill
citedby
bibtex
journal endpoints
mandates CSV retrieval

Google may gate these Citations author-discovery endpoints behind sign-in:

search_keyword
search_keywords
search_author_custom_url
search_org
search_author_by_organization

If you need a reliable author workflow, prefer search_author_id(...).

Tests

From the repository root:

python -m unittest -v testdata.test_module

Target a smaller subset while iterating:

python -m unittest -v testdata.test_module.TestPublicationParser
python -m unittest -v testdata.test_module.TestNavigator

Documentation

See the hosted docs for the full API reference and quickstart:

Contributing

Contributions are welcome. Please create an issue, fork the repository, and submit a pull request. See .github/CONTRIBUTING.md for details.

License

The original code that this project was forked from was released by Luciano Bello under a WTFPL license. In keeping with that spirit, all code is released under the Unlicense.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

2.0.0

Mar 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scholarly2-2.0.0.tar.gz (45.7 kB view details)

Uploaded Mar 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scholarly2-2.0.0-py3-none-any.whl (44.4 kB view details)

Uploaded Mar 6, 2026 Python 3

File details

Details for the file scholarly2-2.0.0.tar.gz.

File metadata

Download URL: scholarly2-2.0.0.tar.gz
Upload date: Mar 6, 2026
Size: 45.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scholarly2-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`175cea4a6b195850b8b3568d8416a4b2b6d48b4bd388cf58e57af2a28888fd4a`
MD5	`d6e0fb9cdff7ff41dcc9b40e4c3c9684`
BLAKE2b-256	`b7c84dfb14c87d74fb9e95aecca0d44384b6131eebeae1a4fcff951a0de712b3`

See more details on using hashes here.

File details

Details for the file scholarly2-2.0.0-py3-none-any.whl.

File metadata

Download URL: scholarly2-2.0.0-py3-none-any.whl
Upload date: Mar 6, 2026
Size: 44.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scholarly2-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ec9012a57cd4fccfbba647726aaedbeeffe98a502d45bccbda6ea9f665c23a96`
MD5	`b6eb9ff93332ff3bccab0ac660c231e3`
BLAKE2b-256	`b805a26311d736afd6efd43b74c00c65999a788f345a6253e6cdce28c3d2f1f5`

See more details on using hashes here.

scholarly2 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

scholarly2

Installation

Optional dependencies

Quick Start

What You Get Back

Parsed Result Examples

search_single_pub(...)

search_pubs(...)

fill(...) on a publication

search_author_id(...)

Search Semantics

search_single_pub(...) vs search_pubs(...)

filled

Finding Author IDs

Citations and BibTeX

Proxies

Automatic .env.socks5 loading

Direct SOCKS5 configuration

Explicit file loading

Deprecated legacy proxy methods

Availability Notes

Tests

Documentation

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`search_single_pub(...)`

`search_pubs(...)`

`fill(...)` on a publication

`search_author_id(...)`

`search_single_pub(...)` vs `search_pubs(...)`

`filled`

Automatic `.env.socks5` loading