Skip to main content

Python library that makes web scraping very simple.

Project description

Documentation is hosted at http://learnwebscraping.com/docs. Note: Documentation is currently being written.

Simplewebscraper is a library designed to facilitate webscraping. It has a lot of built in code for standard web requests, proxy usage, browser cookie imports, and file downloads.

Homepage: https://github.com/alexanderward/simplewebscraper

Simple Usage - More details to come once documentation is complete.

from simplewebscraper import Browser, HTTPMethod, Scraper, ProxyPool

if __name__ == "__main__":

    example_GET = True
    example_GET_parameters = True
    example_POST = False
    example_Proxy = False
    example_cookie_import = False

    if example_GET:
        my_scraper = Scraper()
        my_scraper.HTTP_mode = HTTPMethod.GET
        my_scraper.url = "https://myip.dnsdynamic.org"
        print my_scraper.fetch()

    if example_GET_parameters:
        my_scraper = Scraper()
        my_scraper.HTTP_mode = HTTPMethod.GET
        my_scraper.parameters = {'InData': "75791",
                                 "submit": "Search"}
        my_scraper.url = "http://www.melissadata.com/lookups/GeoCoder.asp"
        print my_scraper.fetch()

    if example_POST:
        my_scraper = Scraper()
        my_scraper.HTTP_mode = HTTPMethod.POST
        my_scraper.parameters = {"email": "example@gmail.com",
                                 "pass": "samplepassword"}
        my_scraper.url = "https://www.dnsdynamic.org/auth.php"
        print my_scraper.fetch()

    if example_Proxy:
        my_scraper = Scraper()
        my_scraper.HTTP_mode = HTTPMethod.GET
        my_scraper.use_per_proxy_count = 5
        my_scraper.proxy_pool = ProxyPool.Hidester  #You can provide a group of proxies like this as well {"https": ["https://212.119.246.138:8080"],"http": []}
        my_scraper.url = "https://myip.dnsdynamic.org"
        print my_scraper.fetch()

    if example_cookie_import:
        my_scraper = Scraper()
        my_scraper.HTTP_mode = HTTPMethod.GET
        my_scraper.cookies = Browser.Chrome  # Chrome or Firefox
        my_scraper.url = "https://myip.dnsdynamic.org"
        print my_scraper.fetch()

Features

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simplewebscraper-1.042.zip (11.9 kB view details)

Uploaded Source

Built Distribution

simplewebscraper-1.042.win32.exe (210.6 kB view details)

Uploaded Source

File details

Details for the file simplewebscraper-1.042.zip.

File metadata

File hashes

Hashes for simplewebscraper-1.042.zip
Algorithm Hash digest
SHA256 bc6bd8d86a15708c9f082870ef005bd38e1920bd560d018cbadda4033fba218c
MD5 9244f6f9961f107ea14949587a48265b
BLAKE2b-256 77616f7b59ee025d94e1e30fd5fd0b85263b52a8553fc264ac57a68fbb44148a

See more details on using hashes here.

File details

Details for the file simplewebscraper-1.042.win32.exe.

File metadata

File hashes

Hashes for simplewebscraper-1.042.win32.exe
Algorithm Hash digest
SHA256 f085745f7ff60e47fd9b9a19969f688afccbe17c77bbb0efb5666b4bab95ecfe
MD5 55349a6602fc7bc00b871ac826917972
BLAKE2b-256 39418705b4ced2781e701328c76e6138eaa7d44edf813908d6f739d39d05f12d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page