Skip to main content

Python library that makes web scraping very simple.

Project description

Documentation is hosted at http://learnwebscraping.com/docs. Note: Documentation is currently being written.

Simplewebscraper is a library designed to facilitate webscraping. It has a lot of built in code for standard web requests, proxy usage, browser cookie imports, and file downloads.

Homepage: https://github.com/alexanderward/simplewebscraper

Simple Usage - More details to come once documentation is complete.

from simplewebscraper import Browser, HTTPMethod, Scraper, ProxyPool

if __name__ == "__main__":

    example_GET = True
    example_GET_parameters = True
    example_POST = False
    example_Proxy = False
    example_cookie_import = False

    if example_GET:
        my_scraper = Scraper()
        my_scraper.HTTP_mode = HTTPMethod.GET
        my_scraper.url = "https://myip.dnsdynamic.org"
        print my_scraper.fetch()

    if example_GET_parameters:
        my_scraper = Scraper()
        my_scraper.HTTP_mode = HTTPMethod.GET
        my_scraper.parameters = {'InData': "75791",
                                 "submit": "Search"}
        my_scraper.url = "http://www.melissadata.com/lookups/GeoCoder.asp"
        print my_scraper.fetch()

    if example_POST:
        my_scraper = Scraper()
        my_scraper.HTTP_mode = HTTPMethod.POST
        my_scraper.parameters = {"email": "example@gmail.com",
                                 "pass": "samplepassword"}
        my_scraper.url = "https://www.dnsdynamic.org/auth.php"
        print my_scraper.fetch()

    if example_Proxy:
        my_scraper = Scraper()
        my_scraper.HTTP_mode = HTTPMethod.GET
        my_scraper.use_per_proxy_count = 5
        my_scraper.proxy_pool = ProxyPool.Hidester  #You can provide a group of proxies like this as well {"https": ["https://212.119.246.138:8080"],"http": []}
        my_scraper.url = "https://myip.dnsdynamic.org"
        print my_scraper.fetch()

    if example_cookie_import:
        my_scraper = Scraper()
        my_scraper.HTTP_mode = HTTPMethod.GET
        my_scraper.cookies = Browser.Chrome  # Chrome or Firefox
        my_scraper.url = "https://myip.dnsdynamic.org"
        print my_scraper.fetch()

Features

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for simplewebscraper, version 1.42a0
Filename, size File type Python version Upload date Hashes
Filename, size simplewebscraper-1.42a0.win32.exe (211.6 kB) File type Windows Installer Python version any Upload date Hashes View
Filename, size simplewebscraper-1.42a0.zip (15.4 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page