Skip to main content

Python wrapper for Prompt API's Scraper API

Project description

Python Version Code style: black Build Status

Prompt API - Scraper API - Python Package

pa-scraper is a python wrapper for scraper api with few more extra cream and sugar.

Requirements

  1. You need to signup for Prompt API
  2. You need to subscribe scraper api, test drive is free!!!
  3. You need to set PROMPTAPI_TOKEN environment variable after subscription.

then;

$ pip install pa-scraper

Example Usage

Examples can be found here.

from scraper import Scraper

url = 'https://pypi.org/classifiers/'
scraper = Scraper(url)
response = scraper.get()

if response.get('error', None):
    # response['error']  returns error message
    # response['status'] returns http status code
    # {'error': 'Not Found', 'status': 404}
    print(response)
else:
    result = response['result']

    print(result['headers'])   # returns response headers 
    print(result['data'])      # returns fetched html
    print(result['url'])       # returns fetched url
    print(response['status'])  # returns http status code

    save_result = scraper.save('/tmp/my-html.html')  # save to file
    if save_result.get('error', None):
        # we have save error
        pass
    else:
        print(save_result)    # contains saved file path and file size
        # {'file': '/tmp/my-html.html', 'size': 321322}

You can add url parameters for extra operations. Valid parameters are:

  • auth_password: for HTTP Realm auth password
  • auth_username: for HTTP Realm auth username
  • cookie: URL Encoded cookie header.
  • country: 2 character country code. If you wish to scrape from an IP address of a specific country.
  • referer: HTTP referer header
from scraper import Scraper

    url = 'https://pypi.org/classifiers/'
    scraper = Scraper(url)

    fetch_params = dict(country='EE')
    response = scraper.get(params=fetch_params)

    if response.get('error', None):
        # response['error']  returns error message
        # response['status'] returns http status code
        # {'error': 'Not Found', 'status': 404}
        print(response)
    else:
        result = response['result']
        status = response['status']

        print(result['headers'])   # returns response headers 
        print(result['data'])      # returns fetched html
        print(result['url'])       # returns fetched url
        print(response['status'])  # returns http status code

        save_result = scraper.save('/tmp/my-html.html')  # save to file
        if save_result.get('error', None):
            # we have save error
            pass
        else:
            print(save_result)    # contains saved file path and file size
            # {'file': '/tmp/my-html.html', 'size': 321322}

TODO

  • Add xpath extractor.

License

This project is licensed under MIT


Contributer(s)


Contribute

All PR’s are welcome!

  1. fork (https://github.com/promptapi/scraper-py/fork)
  2. Create your branch (git checkout -b my-feature)
  3. commit yours (git commit -am 'Add awesome features...')
  4. push your branch (git push origin my-feature)
  5. Than create a new Pull Request!

This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pa-scraper-0.1.2.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

pa_scraper-0.1.2-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file pa-scraper-0.1.2.tar.gz.

File metadata

  • Download URL: pa-scraper-0.1.2.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.4

File hashes

Hashes for pa-scraper-0.1.2.tar.gz
Algorithm Hash digest
SHA256 2cb449ade5d2c68774805a1753360b5325bff6c7decd6275a3672b6dc29f0898
MD5 f8ef835346e1a1a0f977b79fb42a791d
BLAKE2b-256 7da0cfb5f099cb93994f6a5c03fea32f40ef7641301e59a15330275aa4ad7214

See more details on using hashes here.

File details

Details for the file pa_scraper-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: pa_scraper-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 5.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.4

File hashes

Hashes for pa_scraper-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2d1212afa74c3ce0fc06298ff18453b701925ae2886b73a80dc6b32d9bba8de6
MD5 0e98ff37b92f1ca10542a77267454b23
BLAKE2b-256 1180f166bfaf52dd7b9e56c88d43145802a480b2b7f79d525d14df2b8d554a13

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page