Skip to main content

Reliably scrape and clean Google Scholar citations. Automatically uploads to Zotero. Bibtex local file exports are also supported.

Project description

pyserpZotero

Google Scholar citation download, parsing, Bibtex export, and Zotero cloud upload via serpAPI.

GitHub Repo: https://github.com/hack-r/pyserpZotero

PyPI Package: https://pypi.org/project/pyserpZotero/

serpAPI: https://serpAPI.com

Zotero: https://zotero.org

What does it do?

pyserpZotero offers the following functions for (semi-) automating literature review:

  • pyserpCite Module
    • serpZot (class) - Instantiates a serpZot object for API management.
    • searchScholar - Searches Google Scholar for papers corresponding to 1 or more search terms and captures their identifiers.
    • search2Zotero - Pulls references from Google using identifiers from searchScholar, converts to Bibtex via CrossRef, reformats for Zotero, and uploads to your Zotero cloud libary (results will automatically sync to any connect desktop applications).
    • cleanZot - Attempt to remove/replace broken LaTex and other formatting in titles.
    • arxivDownload - Checks Arxiv to see if items in your Zotero library have free PDFs available. Downloads matching PDF's from Arxiv, attaches them to the corresponding library items, and uploads the changes to Zotero.

Why serpAPI?

I'm not a shill for their company, but after a decade of scraping data I've gotten tired of code breaking due to upstream changes, dealing with proxies, and concerns over intellectual property. serpAPI handles those things for you. They offer a free tier, which is currently 100 searches per month and decent pricing. If there are other, comparable services feel free to mention them in an "Issue" and perhaps I'll be able to add support.

How to Configure?

You'll need to provide an API key for serpAPI and Zotero, as well as a Zotero library Id. You can either provide these directly as arguments to the functions or manage them more securely via a YAML configuration file, as in the Example Usage below.

How to Use?

See quickstartDemo.ipynb for a Jupyter notebook demonstration or checkout the example below:

Example Usage


#### Build a list of search terms:
TERMS = ['reinforcement learning', 'traveling salesman', 'nowcasting', 'propensity score']

MIN_YEAR = "2010" # Oldest year to search
SAVE_BIB = False  # Save a Bibtex file (.bib)?
USE_ZOT  = True   # Upload to Zotero?
CLEAN    = False  # Attempt to remove/repair broken LaTex and other formatting 


#### Load libraries
from box import Box

import importlib
import pyserpZotero
import yaml

importlib.reload(pyserpZotero)

#### Import Credentials from Your YAML File
with open("config.yaml", "r") as ymlfile:
    cfg = Box(yaml.safe_load(ymlfile), default_box=True, default_box_attr=None)

API_KEY = cfg.API_KEY
ZOT_ID  = cfg.ZOT_ID
ZOT_KEY = cfg.ZOT_KEY

#### Instantiate a serpZot object for API management
citeObj = pyserpZotero.serpZot(API_KEY  = API_KEY, 
                             ZOT_ID   = ZOT_ID, 
                             ZOT_KEY  = ZOT_KEY)

#### Call the search method
for i in range(len(TERMS)):
    print(citeObj.searchScholar(TERM     = TERMS[i], 
                                MIN_YEAR = MIN_YEAR,
                                SAVE_BIB = SAVE_BIB))
    print("This should've returned 0 (sucess)")
    # Upload the parsed results
    print(citeObj.search2Zotero())


#### Clean Ugly Raw LaText (as Much as Possible)
if CLEAN:
    citeObj.cleanZot(ZOT_ID      = ZOT_ID, 
                     ZOT_KEY     = ZOT_KEY,
                     SEARCH_TERM = "\\") # optional (defaults to all items)

    # Check Arxiv for Free PDFs of Papers and Attach / Upload Them To Zotero
    citeObj.arxivDownload()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyserpzotero-1.0.6.tar.gz (57.3 MB view details)

Uploaded Source

Built Distribution

pyserpzotero-1.0.6-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file pyserpzotero-1.0.6.tar.gz.

File metadata

  • Download URL: pyserpzotero-1.0.6.tar.gz
  • Upload date:
  • Size: 57.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for pyserpzotero-1.0.6.tar.gz
Algorithm Hash digest
SHA256 c22395a8258c034f993b0d9084b1d5cf4a7b8234a8ff4ea409a346163573c742
MD5 b1f1e8144e7890c7cee5545a302b2ecc
BLAKE2b-256 63e4a6c53671d0a084d46407038a255e7a0fc9c9cfd583cc3356fd410d890702

See more details on using hashes here.

File details

Details for the file pyserpzotero-1.0.6-py3-none-any.whl.

File metadata

File hashes

Hashes for pyserpzotero-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 d44c5ddce5d6083f1bcd3f054a467306c4ac3301de7428ead435aeaf80312adc
MD5 931ec1a405e65d4f138b46d0840801ef
BLAKE2b-256 684598d668d4244346e81e39b697c8cd88be13c826315659446a8d8d56c59509

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page