A package to scrape patents from 'https://patents.google.com/'
Project description
Patent Scraper
A python package to scrape patents from 'https://patents.google.com/'. The package is made up of a single python class, scraper_class. This scraper can be used both to retreive parsed html of a single patents page or a list of patents.
The main elements returned by the scraper class are:
application_number (str) : application number
inventor_name (json) : inventors of patent
assignee_name_orig (json) : original assignees to patent
assignee_name_current (json) : current assignees to patent
pub_date (str) : publication date
filing_date (str) : filing date
priority_date (str) : priority date
grant_date (str) : grant date
forward_cites_no_family (json) : forward citations that are not family-to-family cites
forward_cites_yes_family (json) : forward citations that are family-to-family cites
backward_cites_no_family (json) : backward citations that are not family-to-family cites
backward_cites_yes_family (json) : backward citations that are family-to-family cites
Package Installation
The package is available on PyPi, and can be installed using pip:
pip install google_patent_scraper
Main Use Cases
There are two primary ways to use this package:
- Scrape a single patent
# ~ Import packages ~ # from google_patent_scraper import scraper_class # ~ Initialize scraper class ~ # scraper=scraper_class() # ~~ Scrape patents individually ~~ # patent_1 = 'US2668287A' patent_2 = 'US266827A' err_1, soup_1, url_1 = scraper.request_single_patent(patent_1) err_2, soup_2, url_2 = scraper.request_single_patent(patent_2) # ~ Parse results of scrape ~ # patent_1_parsed = scraper.get_scraped_data(soup_1,patent_1,url_1) patent_2_parsed = scraper.get_scraped_data(soup_2,patetn_2,url_2)
- Scrape a list of patents
# ~ Import packages ~ # from google_patent_scraper import scraper_class import json # ~ Initialize scraper class ~ # scraper=scraper_class() # ~ Add patents to list ~ # scraper.add_patents('US2668287A') scraper.add_patents('US266827A') # ~ Scrape all patents ~ # scraper.scrape_all_patents() # ~ Get results of scrape ~ # patent_1_parsed = scraper.parsed_patents['US2668287A'] patent_2_parsed = scraper.parsed_patents['US266827A'] # ~ Print inventors of patent US2668287A ~ # for inventor in json.loads(patent_1_parsed['inventor_name']): print('Patent inventor : {0}'.format(inventor['inventor_name']))
Example Files
I have provided two seperate example scripts for usage of this package in the /example/ folder:
- Examples from this readme: readme_example
- Scrape multiple patents using Python's multiprocessing package: multiprocess_example
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size | File type | Python version | Upload date | Hashes |
---|---|---|---|---|
Filename, size google_patent_scraper-1.0.7-py3-none-any.whl (6.9 kB) | File type Wheel | Python version py3 | Upload date | Hashes View |
Filename, size google_patent_scraper-1.0.7.tar.gz (5.0 kB) | File type Source | Python version None | Upload date | Hashes View |
Close
Hashes for google_patent_scraper-1.0.7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 17be2be4cc65cab9464ebf0eeb2baa8f962cc83172d09df9b27be2f11edc6281 |
|
MD5 | 5b249ec563bed30fe8f75184c29e7298 |
|
BLAKE2-256 | fb3e6e311dfac38e9113d004dd1f12e4342997d576dec29a3f29f82ab698c7cd |
Close
Hashes for google_patent_scraper-1.0.7.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1d81eb8b39b1e0a996dca0faa5ac19c4935d88686b47c1379d5c986f39211019 |
|
MD5 | ba2161a1b4c5760cb8b848a37d2c8fd3 |
|
BLAKE2-256 | 246c3cd9be8123f0ded1d223af122bff1ad411292e34535d53b79c07588c478d |