Skip to main content

Reliably scrape and clean Google Scholar citations. Automatically uploads to Zotero. Bibtex local file exports are also supported.

Project description

pyserpZotero

Google Scholar citation download, parsing, Bibtex export, search for free PDFs, and Zotero cloud upload. SerpAPI is leveraged for stable access to Google Scholar without IP throttling.

What does it do?

pyserpZotero is a sophisticated Python library designed to automate the management of scholarly literature citations and PDF downloads. It leverages the power of SerpAPI for reliable access to Google Scholar and utilizes the Zotero service for efficient citation management. The library simplifies the process of searching for academic papers, downloading them (where available for free), and organizing citations directly into your Zotero library.

pyserpZotero offers the following functions for (semi-) automating literature review:

  • SerpZot (class) - Instantiates a SerpZot object for API management.
    • SearchScholar - Searches Google Scholar for papers corresponding to 1 or more search terms and captures their identifiers.
    • Search2Zotero - Pulls references from Google using identifiers from SearchScholar, converts to Bibtex via CrossRef, reformats for Zotero, looks for PDFs, and uploads to your Zotero cloud library (results will automatically sync to the desktop client, if installed).
    • CleanZot - Attempt to remove/replace broken LaTex and other formatting in titles.

Why SerpAPI?

I'm not a shill for their company, but after a decade of scraping data I've gotten tired of code breaking due to upstream changes, dealing with proxies, and concerns over intellectual property. SerpAPI handles those things for you. They offer a free tier, which is currently 100 searches per month and decent pricing.

How to configure it?

You'll need to provide an API key for serpAPI and Zotero, as well as a Zotero library Id. You can either provide these directly as arguments to the functions, via the interactive mode, or manage them more securely via a YAML configuration file, as in the Example Usage below.

How to use it?

Beginning with v1.1 an interactive mode is available by running the main file (pyserpZotero.py). See quickstart.ipynb for a Jupyter notebook demonstration of API access.

What's new?

March 2024: Added support for additional portals and PDF sources, including medArxiv and bioRxiv. Improved matching of citations to Arxiv PDFs.

Why do you sometimes align assignment operators across lines like that?

As a data scientist, I'm a programming polyglot and long-time R programmer. Following top style guides for R, some of us like our code to be readable by human beings without wasting much time on it - it's a trick for easily making the code structured.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyserpzotero-1.1.0.tar.gz (25.1 kB view details)

Uploaded Source

Built Distribution

pyserpzotero-1.1.0-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file pyserpzotero-1.1.0.tar.gz.

File metadata

  • Download URL: pyserpzotero-1.1.0.tar.gz
  • Upload date:
  • Size: 25.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.9

File hashes

Hashes for pyserpzotero-1.1.0.tar.gz
Algorithm Hash digest
SHA256 8e6279c75f194937e68618207ec2b076c6341c052f6507678e06ce87e25cc9dc
MD5 8961975b7a374fbd6d0136b0b4effbd0
BLAKE2b-256 c619656919ac23000830ad6bf9903f4f3bc09bac064d589fd26fcebec65b2256

See more details on using hashes here.

File details

Details for the file pyserpzotero-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pyserpzotero-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 767e65ff3f854965a074ca2041c809c5ea189d73e5e1035e3cf8ab13a9d041b9
MD5 02eda8f45800e917773ecf5680a53526
BLAKE2b-256 b3c76c99247608b668a83f821ba527b577a8b75b173212e2f99f6a35ea0e94e3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page