A package to scrape patents from 'https://patents.google.com/'
Project description
Patent Scraper
A python package to scrape patents from 'https://patents.google.com/'. The package is made up of a single python class, scraper_class. This scraper can be used both to retreive parsed html of a single patents page or a list of patents.
The main elements returned by the scraper class are:
application_number (str) : application number
inventor_name (json) : inventors of patent
assignee_name_orig (json) : original assignees to patent
assignee_name_current (json) : current assignees to patent
pub_date (str) : publication date
filing_date (str) : filing date
priority_date (str) : priority date
grant_date (str) : grant date
forward_cites_no_family (json) : forward citations that are not family-to-family cites
forward_cites_yes_family (json) : forward citations that are family-to-family cites
backward_cites_no_family (json) : backward citations that are not family-to-family cites
backward_cites_yes_family (json) : backward citations that are family-to-family cites
Package Installation
The package is available on PyPi, and can be installed using pip:
pip install google_patent_scraper
Main Use Cases
There are two primary ways to use this package:
- Scrape a single patent
# ~ Import packages ~ # from google_patent_scraper import scraper_class # ~ Initialize scraper class ~ # scraper=scraper_class() # ~~ Scrape patents individually ~~ # patent_1 = 'US2668287A' patent_2 = 'US266827A' err_1, soup_1, url_1 = scraper.request_single_patent(patent_1) err_2, soup_2, url_2 = scraper.request_single_patent(patent_2) # ~ Parse results of scrape ~ # patent_1_parsed = scraper.get_scraped_data(soup_1,patent_1,url_1) patent_2_parsed = scraper.get_scraped_data(soup_2,patetn_2,url_2)
- Scrape a list of patents
# ~ Import packages ~ # from google_patent_scraper import scraper_class import json # ~ Initialize scraper class ~ # scraper=scraper_class() # ~ Add patents to list ~ # scraper.add_patents('US2668287A') scraper.add_patents('US266827A') # ~ Scrape all patents ~ # scraper.scrape_all_patents() # ~ Get results of scrape ~ # patent_1_parsed = scraper.parsed_patents['US2668287A'] patent_2_parsed = scraper.parsed_patents['US266827A'] # ~ Print inventors of patent US2668287A ~ # for inventor in json.loads(patent_1_parsed['inventor_name']): print('Patent inventor : {0}'.format(inventor['inventor_name']))
Example Files
I have provided two seperate example scripts for usage of this package in the /example/ folder:
- Examples from this readme: readme_example
- Scrape multiple patents using Python's multiprocessing package: multiprocess_example
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size | File type | Python version | Upload date | Hashes |
---|---|---|---|---|
Filename, size google_patent_scraper-1.0.8-py3-none-any.whl (6.9 kB) | File type Wheel | Python version py3 | Upload date | Hashes View |
Filename, size google_patent_scraper-1.0.8.tar.gz (5.0 kB) | File type Source | Python version None | Upload date | Hashes View |
Close
Hashes for google_patent_scraper-1.0.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 26f9813ce2bf433285bdd756b9c7dc5501e9f0210e97019e3ee2a45ec85c3b2a |
|
MD5 | 09b02fd7dc6866cdd964eeced5397060 |
|
BLAKE2-256 | 4731552cd9979c17331be0ba66e1eb10c70a9c33d8aa323b93aef85295906708 |
Close
Hashes for google_patent_scraper-1.0.8.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | cfe6abade27867c94a4d3abb522f76d4a7da2827942c1c548fbd10608731ea48 |
|
MD5 | cfb9dea37088040f61ff7b37fcee54db |
|
BLAKE2-256 | 4165b791565a7dd1488142cc159222c285cad8dd63a6c9ece4ee8586f7c28e64 |