Skip to main content

Python library that makes web scraping very simple.

Project description

Documentation is hosted at http://learnwebscraping.com/docs. Note: Documentation is currently being written.

Simplewebscraper is a library designed to facilitate webscraping. It has a lot of built in code for standard web requests, proxy usage, browser cookie imports, and file downloads.

Homepage: https://github.com/alexanderward/simplewebscraper

Simple Usage - More details to come once documentation is complete.

from simplewebscraper import Browser, HTTPMethod, Scraper, ProxyPool

if __name__ == "__main__":

    example_GET = True
    example_GET_parameters = True
    example_POST = False
    example_Proxy = False
    example_cookie_import = False

    if example_GET:
        my_scraper = Scraper()
        my_scraper.HTTP_mode = HTTPMethod.GET
        my_scraper.url = "https://myip.dnsdynamic.org"
        print my_scraper.fetch()

    if example_GET_parameters:
        my_scraper = Scraper()
        my_scraper.HTTP_mode = HTTPMethod.GET
        my_scraper.parameters = {'InData': "75791",
                                 "submit": "Search"}
        my_scraper.url = "http://www.melissadata.com/lookups/GeoCoder.asp"
        print my_scraper.fetch()

    if example_POST:
        my_scraper = Scraper()
        my_scraper.HTTP_mode = HTTPMethod.POST
        my_scraper.parameters = {"email": "example@gmail.com",
                                 "pass": "samplepassword"}
        my_scraper.url = "https://www.dnsdynamic.org/auth.php"
        print my_scraper.fetch()

    if example_Proxy:
        my_scraper = Scraper()
        my_scraper.HTTP_mode = HTTPMethod.GET
        my_scraper.use_per_proxy_count = 5
        my_scraper.proxy_pool = ProxyPool.Hidester  #You can provide a group of proxies like this as well {"https": ["https://212.119.246.138:8080"],"http": []}
        my_scraper.url = "https://myip.dnsdynamic.org"
        print my_scraper.fetch()

    if example_cookie_import:
        my_scraper = Scraper()
        my_scraper.HTTP_mode = HTTPMethod.GET
        my_scraper.cookies = Browser.Chrome  # Chrome or Firefox
        my_scraper.url = "https://myip.dnsdynamic.org"
        print my_scraper.fetch()

Features

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simplewebscraper-1.42.zip (15.3 kB view details)

Uploaded Source

Built Distribution

simplewebscraper-1.42.win32.exe (211.6 kB view details)

Uploaded Source

File details

Details for the file simplewebscraper-1.42.zip.

File metadata

File hashes

Hashes for simplewebscraper-1.42.zip
Algorithm Hash digest
SHA256 4f0dfe236ff5e5ca81c796fa23784d21f0e94f58cc481884a9127c0a7acec549
MD5 5062a0f433780729836703140458d9ce
BLAKE2b-256 8b62cc379cb3da4dfe3a97150090f0fcec9fb28529204e62e6223d9249ffe73a

See more details on using hashes here.

File details

Details for the file simplewebscraper-1.42.win32.exe.

File metadata

File hashes

Hashes for simplewebscraper-1.42.win32.exe
Algorithm Hash digest
SHA256 8269a7d3847eb95590a4b2de13b25e61b0053fe240d1a078bb3d549bbbdc07c4
MD5 bfe227fe4969476fa3b64a72c8597a82
BLAKE2b-256 eaa38bdc154593bfda03300aae3c1f7c51821659c0e27a0e5c02cf70f99a0e9c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page