Skip to main content

Python wrapper for Prompt API's Scraper API

Project description

Python Version Code style: black Build Status

Prompt API - Scraper API - Python Package

pa-scraper is a python wrapper for scraper api with few more extra cream and sugar.

Requirements

  1. You need to signup for Prompt API
  2. You need to subscribe scraper api, test drive is free!!!
  3. You need to set PROMPTAPI_TOKEN environment variable after subscription.

then;

$ pip install pa-scraper

Example Usage

Examples can be found here.

# examples/fetch.py

from scraper import Scraper

url = 'https://pypi.org/classifiers/'
scraper = Scraper(url)
response = scraper.get()

if response.get('error', None):
    # response['error']  returns error message
    # response['status'] returns http status code

    # Example: {'error': 'Not Found', 'status': 404}
    print(response)  # noqa: T001
else:
    data = response['result']['data']
    headers = response['result']['headers']
    url = response['result']['url']
    status = response['status']

    # print(data) # print fetched html, will be long :)

    print(headers)  # noqa: T001
    # {'Content-Length': '321322', 'Content-Type': 'text/html; charset=UTF-8', ... }

    print(status)  # noqa: T001
    # 200

    save_result = scraper.save('/tmp/my-data.html')  # noqa: S108

    if save_result.get('error', None):
        # save error occured...
        # add you code here...
        pass

    print(save_result)  # noqa: T001
    # {'file': '/tmp/my-data.html', 'size': 321322}

You can add url parameters for extra operations. Valid parameters are:

  • auth_password: for HTTP Realm auth password
  • auth_username: for HTTP Realm auth username
  • cookie: URL Encoded cookie header.
  • country: 2 character country code. If you wish to scrape from an IP address of a specific country.
  • referer: HTTP referer header
  • selector: CSS style XPath selector such as a.btn div li. If selector is enabled, returning result will be collection of data and saved file will be in .json format.

Here is an example with using url parameters and selector:

# examples/fetch_with_params.py

from scraper import Scraper

url = 'https://pypi.org/classifiers/'
scraper = Scraper(url)

fetch_params = dict(country='EE', selector='ul li button[data-clipboard-text]')
response = scraper.get(params=fetch_params)

if response.get('error', None):
    # response['error']  returns error message
    # response['status'] returns http status code

    # Example: {'error': 'Not Found', 'status': 404}
    print(response)  # noqa: T001
else:
    data = response['result']['data']
    headers = response['result']['headers']
    url = response['result']['url']
    status = response['status']

    # print(data)  # noqa: T001
    # ['<button class="button button--small margin-top margin-bottom copy-tooltip copy-tooltip-w" ...\n', ]

    print(len(data))  # noqa: T001
    # 734
    # we have an array...

    print(headers)  # noqa: T001
    # {'Content-Length': '321322', 'Content-Type': 'text/html; charset=UTF-8', ... }

    print(status)  # noqa: T001
    # 200

    save_result = scraper.save('/tmp/my-data.json')  # noqa: S108
    if save_result.get('error', None):
        # save error occured...
        # add you code here...
        pass
    print(save_result)  # noqa: T001
    # {'file': '/tmp/my-data.json', 'size': 174449}

License

This project is licensed under MIT


Contributer(s)


Contribute

All PR’s are welcome!

  1. fork (https://github.com/promptapi/scraper-py/fork)
  2. Create your branch (git checkout -b my-feature)
  3. commit yours (git commit -am 'Add awesome features...')
  4. push your branch (git push origin my-feature)
  5. Than create a new Pull Request!

This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pa-scraper-0.2.0.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

pa_scraper-0.2.0-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file pa-scraper-0.2.0.tar.gz.

File metadata

  • Download URL: pa-scraper-0.2.0.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.4

File hashes

Hashes for pa-scraper-0.2.0.tar.gz
Algorithm Hash digest
SHA256 45c1bed98892eeee2a2987b4ac6b7a5b527d865a8d21ac00b50e0cb570d89bf4
MD5 f05867b9e7ff5a7e7441f6e6c3aa5f81
BLAKE2b-256 6b322281caed2ef2996cf5cd2695002f699d44323c458b676c8d67307b6e9d35

See more details on using hashes here.

File details

Details for the file pa_scraper-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: pa_scraper-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.4

File hashes

Hashes for pa_scraper-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c1d101d3db2eda7664ad6f9a643609d99bd7ae1be9c1af23ba9964b82d307c7c
MD5 0e99136008f53e72421454dcbfcc76bc
BLAKE2b-256 feaac277a07e6965d30a25b66195579b2e4bcbadf759970d01e9bef96f0fc188

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page