Skip to main content

Python library that makes web scraping very simple.

Project description

Documentation is hosted at http://learnwebscraping.com/docs. Note: Documentation is currently being written.

Simplewebscraper is a library designed to facilitate webscraping. It has a lot of built in code for standard web requests, proxy usage, browser cookie imports, and file downloads.

Homepage: https://github.com/alexanderward/simplewebscraper

Simple Usage - More details to come once documentation is complete.

from simplewebscraper import Browser, HTTPMethod, Scraper, ProxyPool

if __name__ == "__main__":

    example_GET = True
    example_GET_parameters = True
    example_POST = False
    example_Proxy = False
    example_cookie_import = False

    if example_GET:
        my_scraper = Scraper()
        my_scraper.HTTP_mode = HTTPMethod.GET
        my_scraper.url = "https://myip.dnsdynamic.org"
        print my_scraper.fetch()

    if example_GET_parameters:
        my_scraper = Scraper()
        my_scraper.HTTP_mode = HTTPMethod.GET
        my_scraper.parameters = {'InData': "75791",
                                 "submit": "Search"}
        my_scraper.url = "http://www.melissadata.com/lookups/GeoCoder.asp"
        print my_scraper.fetch()

    if example_POST:
        my_scraper = Scraper()
        my_scraper.HTTP_mode = HTTPMethod.POST
        my_scraper.parameters = {"email": "example@gmail.com",
                                 "pass": "samplepassword"}
        my_scraper.url = "https://www.dnsdynamic.org/auth.php"
        print my_scraper.fetch()

    if example_Proxy:
        my_scraper = Scraper()
        my_scraper.HTTP_mode = HTTPMethod.GET
        my_scraper.use_per_proxy_count = 5
        my_scraper.proxy_pool = ProxyPool.Hidester  #You can provide a group of proxies like this as well {"https": ["https://212.119.246.138:8080"],"http": []}
        my_scraper.url = "https://myip.dnsdynamic.org"
        print my_scraper.fetch()

    if example_cookie_import:
        my_scraper = Scraper()
        my_scraper.HTTP_mode = HTTPMethod.GET
        my_scraper.cookies = Browser.Chrome  # Chrome or Firefox
        my_scraper.url = "https://myip.dnsdynamic.org"
        print my_scraper.fetch()

Features

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simplewebscraper-1.42rc0.zip (15.4 kB view details)

Uploaded Source

Built Distribution

simplewebscraper-1.42rc0.win32.exe (211.6 kB view details)

Uploaded Source

File details

Details for the file simplewebscraper-1.42rc0.zip.

File metadata

File hashes

Hashes for simplewebscraper-1.42rc0.zip
Algorithm Hash digest
SHA256 3e3c924c891fb1a5938e38564452af076dc4a9b6e3f6b0a4cc882a81ccc9bb59
MD5 6ee447d4cbb7a382afa86c5b497ad517
BLAKE2b-256 9576bb39ee9ea1723befc1f0d7e69a5d3972fcd11dc6fb57c76d807cb33b4e4f

See more details on using hashes here.

File details

Details for the file simplewebscraper-1.42rc0.win32.exe.

File metadata

File hashes

Hashes for simplewebscraper-1.42rc0.win32.exe
Algorithm Hash digest
SHA256 a86615ea3a2d4076f4c7aaffa550f957d5f7bcf05d47c59d224f01710b626ac6
MD5 4bc141b177c8055c11b18165beaa0164
BLAKE2b-256 a9a866aa5cecbe1b8d057fc31ab584bce889f5d81e543c72673bf7c40f985287

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page