A package to scrape patents from 'https://patents.google.com/'
Project description
Patent Scraper
A python package to scrape patents from 'https://patents.google.com/'. The package is made up ofa single python class, google_scraper(). This scraper can be used both to retreive parsed html of a single patents page or a list of patents.
Main Use Cases
There are two primary ways to use this package:
- Scrape a single patent
# ~ Import packages ~ #
from patent_scraper import google_scraper
import json
# ~ Initialize scraper class ~ #
scraper=google_scrape()
# ~ Scrape patents individually ~ #
#
# Request single patent returns whether the scrape
# was successful and the parsed html using bs4
err_1, soup_1 = scraper.request_single_patent('US2668287A')
err_2, soup_2 = scraper.request_single_patent('US266827A')
# ~ Parse results of scrape ~ #
patent_1_parsed = scraper.process_patent_html(soup_1)
patent_2_parsed = scraper.process_patent_html(soup_2)
- Scrape a list of patents
# ~ Import packages ~ #
from patent_scraper import google_scraper
import json
# ~ Initialize scraper class ~ #
scraper=google_scrape() #<- Initialize class
# ~ Add patents to list ~ #
scraper.add_patents('2668287A')
scraper.add_patents('266827A')
# ~ Scrape all patents ~ #
scraper.scrape_all_patents()
# ~ Get results of scrape ~ #
patent_1_parsed = scraper.parsed_patents['US2668287A']
patent_2_parsed = scraper.parsed_patents['US266827A']
# ~ Print inventors of patent US2668287A ~ #
for inventor in json.loads(patent_1_parsed['inventor_name']):
print('Patent inventor : {0}'.format(inventor['inventor_name'])
Example Files
I have provided two seperate example scripts for usage of this package:
- Scrape a patent
- Scrape many patents using multiprocessing module
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for google_patent_scraper-1.0.5.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | ddc8dabe04d05cd6e144e86d242857648e2c023607d0b029bae329ea110abffd |
|
MD5 | 1fdf57bff03f2671ebf0b333962b0499 |
|
BLAKE2b-256 | b743f0af085129718d607eb73b2ac9fb54498d924493b80b4a985c37d72efe7c |
Close
Hashes for google_patent_scraper-1.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f6189674c9416add7154438c4535b21b9fba2d03b98558fc3cf7a852f5f55a5 |
|
MD5 | ff3c511374840a29e7fbd0ae3650ece5 |
|
BLAKE2b-256 | 5242c831a84680f08964f56d35e9f6ff214791587d5b8fe0337d3fb655526b11 |