Reliably scrape and clean Google Scholar citations. Automatically uploads to Zotero. Bibtex local file exports are also supported.
Project description
pyserpZotero
Google Scholar citation download, parsing, Bibtex export, and Zotero cloud upload via serpAPI.
GitHub Repo: https://github.com/hack-r/pyserpZotero/edit/main/README.md
PyPi Package: https://pypi.org/project/pyserpZotero/1.0.2/
serpAPI: https://serpAPI.com
Zotero: https://zotero.org
What does it do?
pyserpZotero offers 2 modules with the following functions for (semi-) automating literature review:
-
pyserpCite Module
- serpZot - Instantiates a serpZot object for API management.
- searchScholar - Searches Google Scholar for papers corresponding to 1 or more serarch terms and capture their identifiers.
- search2Zotero - Pulls references from Google using identifiers from searchScholar, convert to Bibtex via CrossRef, reformat for Zotero, and upload to your Zotero cloud libary (results will automatically sync to any connect desktop applications).
-
cleanZot Module
- serpZot - Attempt to remove/replace broken LaTex and other formatting in titles.
Why serpAPI?
I'm not a shill for their company, but after a decade of scraping data I've gotten tired of code breaking due to upstream changes, dealing with proxies, and concerns over intellectual property. serpAPI handles those things for you. They offer a free tier, which is currently 100 searches per month and decent pricing. If there are other, comparable services feel free to mention them in an "Issue" and perhaps I'll be able to add support.
How to Configure?
You'll need to provide an API key for serpAPI and Zotero, as well as a Zotero library Id. You can either provide these directly as arguments to the functions or manage them more securely via a YAML configuration file, as in the Example Usage below.
Example Usage
#### Build a list of search terms:
TERMS = ['reinforcement learning', 'traveling salesman', 'nowcasting', 'propensity score']
MIN_YEAR = "2010" # Oldest year to search
SAVE_BIB = False # Save a Bibtex file (.bib)?
USE_ZOT = True # Upload to Zotero?
CLEAN = False # Attempt to remove/repair broken LaTex and other formatting
#### Load libraries
from box import Box
import cleanZot
import importlib
import pyserpCite
import yaml
importlib.reload(pyserpCite)
#### Import Credentials from Your YAML File
with open("config.yaml", "r") as ymlfile:
cfg = Box(yaml.safe_load(ymlfile), default_box=True, default_box_attr=None)
API_KEY = cfg.API_KEY
ZOT_ID = cfg.ZOT_ID
ZOT_KEY = cfg.ZOT_KEY
#### Instantiate a serpZot object for API management
citeObj = pyserpCite.serpZot(API_KEY = API_KEY,
ZOT_ID = ZOT_ID,
ZOT_KEY = ZOT_KEY)
#### Call the search method
for i in range(len(TERMS)):
print(citeObj.searchScholar(TERM = TERMS[i],
MIN_YEAR = MIN_YEAR,
SAVE_BIB = SAVE_BIB))
print("This should've returned 0 (sucess)")
# Upload the parsed results
print(citeObj.search2Zotero())
#### Clean Ugly Raw LaText (as Much as Possible)
if CLEAN:
cleanZot.serpZot(ZOT_ID = ZOT_ID,
ZOT_KEY = ZOT_KEY,
SEARCH_TERM = "\\") # optional (defaults to all items)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pyserpzotero-1.0.2.tar.gz
.
File metadata
- Download URL: pyserpzotero-1.0.2.tar.gz
- Upload date:
- Size: 12.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 63f7b270183cadf9b2318874848af78deed79ad20fe896d9c113ea31cc80121f |
|
MD5 | e3029b782d2bb18c3e13d7fcd021e519 |
|
BLAKE2b-256 | 81b499f40b0c9176c04248000256bd333192296c809461dd5a994234326b4b10 |
File details
Details for the file pyserpzotero-1.0.2-py3-none-any.whl
.
File metadata
- Download URL: pyserpzotero-1.0.2-py3-none-any.whl
- Upload date:
- Size: 6.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 45601f3de43c6d41df792d1d6019b827c888ddc008d251a62c5c69185f79a9b1 |
|
MD5 | 4b29e502d46e3aa7adbd1d872854a717 |
|
BLAKE2b-256 | 95cf1d92f116d9fbe8f5de5069f7917b754d37bdf2e5cf9bed94ba56735bb65a |