Python library that makes web scraping very simple.
Project description
Documentation is hosted at http://learnwebscraping.com/docs. Note: Documentation is currently being written.
Simplewebscraper is a library designed to facilitate webscraping. It has a lot of built in code for standard web requests, proxy usage, browser cookie imports, and file downloads.
Homepage: https://github.com/alexanderward/simplewebscraper
Simple Usage - More details to come once documentation is complete.
from simplewebscraper import Browser, HTTPMethod, Scraper, ProxyPool
if __name__ == "__main__":
example_GET = True
example_GET_parameters = True
example_POST = False
example_Proxy = False
example_cookie_import = False
if example_GET:
my_scraper = Scraper()
my_scraper.HTTP_mode = HTTPMethod.GET
my_scraper.url = "https://myip.dnsdynamic.org"
print my_scraper.fetch()
if example_GET_parameters:
my_scraper = Scraper()
my_scraper.HTTP_mode = HTTPMethod.GET
my_scraper.parameters = {'InData': "75791",
"submit": "Search"}
my_scraper.url = "http://www.melissadata.com/lookups/GeoCoder.asp"
print my_scraper.fetch()
if example_POST:
my_scraper = Scraper()
my_scraper.HTTP_mode = HTTPMethod.POST
my_scraper.parameters = {"email": "example@gmail.com",
"pass": "samplepassword"}
my_scraper.url = "https://www.dnsdynamic.org/auth.php"
print my_scraper.fetch()
if example_Proxy:
my_scraper = Scraper()
my_scraper.HTTP_mode = HTTPMethod.GET
my_scraper.use_per_proxy_count = 5
my_scraper.proxy_pool = ProxyPool.Hidester #You can provide a group of proxies like this as well {"https": ["https://212.119.246.138:8080"],"http": []}
my_scraper.url = "https://myip.dnsdynamic.org"
print my_scraper.fetch()
if example_cookie_import:
my_scraper = Scraper()
my_scraper.HTTP_mode = HTTPMethod.GET
my_scraper.cookies = Browser.Chrome # Chrome or Firefox
my_scraper.url = "https://myip.dnsdynamic.org"
print my_scraper.fetch()
Features
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
simplewebscraper-1.42rc0.zip
(15.4 kB
view details)
Built Distribution
simplewebscraper-1.42rc0.win32.exe
(211.6 kB
view details)
File details
Details for the file simplewebscraper-1.42rc0.zip
.
File metadata
- Download URL: simplewebscraper-1.42rc0.zip
- Upload date:
- Size: 15.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e3c924c891fb1a5938e38564452af076dc4a9b6e3f6b0a4cc882a81ccc9bb59 |
|
MD5 | 6ee447d4cbb7a382afa86c5b497ad517 |
|
BLAKE2b-256 | 9576bb39ee9ea1723befc1f0d7e69a5d3972fcd11dc6fb57c76d807cb33b4e4f |
File details
Details for the file simplewebscraper-1.42rc0.win32.exe
.
File metadata
- Download URL: simplewebscraper-1.42rc0.win32.exe
- Upload date:
- Size: 211.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a86615ea3a2d4076f4c7aaffa550f957d5f7bcf05d47c59d224f01710b626ac6 |
|
MD5 | 4bc141b177c8055c11b18165beaa0164 |
|
BLAKE2b-256 | a9a866aa5cecbe1b8d057fc31ab584bce889f5d81e543c72673bf7c40f985287 |