Skip to main content

Python wrapper for Prompt API's Scraper API

Project description

Python Version Code style: black Build Status

Prompt API - Scraper API - Python Package

pa-scraper is a python wrapper for scraper api with few more extra cream and sugar.

Requirements

  1. You need to signup for Prompt API
  2. You need to subscribe scraper api, test drive is free!!!
  3. You need to set PROMPTAPI_TOKEN environment variable after subscription.

then;

$ pip install pa-scraper

Example Usage

Examples can be found here.

# examples/fetch.py

from scraper import Scraper

url = 'https://pypi.org/classifiers/'
scraper = Scraper(url)
response = scraper.get()

if response.get('error', None):
    # response['error']  returns error message
    # response['status'] returns http status code

    # Example: {'error': 'Not Found', 'status': 404}
    print(response)  # noqa: T001
else:
    data = response['result']['data']
    headers = response['result']['headers']
    url = response['result']['url']
    status = response['status']

    # print(data) # print fetched html, will be long :)

    print(headers)  # noqa: T001
    # {'Content-Length': '321322', 'Content-Type': 'text/html; charset=UTF-8', ... }

    print(status)  # noqa: T001
    # 200

    save_result = scraper.save('/tmp/my-data.html')  # noqa: S108

    if save_result.get('error', None):
        # save error occured...
        # add you code here...
        pass

    print(save_result)  # noqa: T001
    # {'file': '/tmp/my-data.html', 'size': 321322}

You can add url parameters for extra operations. Valid parameters are:

  • auth_password: for HTTP Realm auth password
  • auth_username: for HTTP Realm auth username
  • cookie: URL Encoded cookie header.
  • country: 2 character country code. If you wish to scrape from an IP address of a specific country.
  • referer: HTTP referer header
  • selector: CSS style selector path such as a.btn div li. If selector is enabled, returning result will be collection of data and saved file will be in .json format.

Here is an example with using url parameters and selector:

# examples/fetch_with_params.py

from scraper import Scraper

url = 'https://pypi.org/classifiers/'
scraper = Scraper(url)

fetch_params = dict(country='EE', selector='ul li button[data-clipboard-text]')
response = scraper.get(params=fetch_params)

if response.get('error', None):
    # response['error']  returns error message
    # response['status'] returns http status code

    # Example: {'error': 'Not Found', 'status': 404}
    print(response)  # noqa: T001
else:
    data = response['result']['data']
    headers = response['result']['headers']
    url = response['result']['url']
    status = response['status']

    # print(data)  # noqa: T001
    # ['<button class="button button--small margin-top margin-bottom copy-tooltip copy-tooltip-w" ...\n', ]

    print(len(data))  # noqa: T001
    # 734
    # we have an array...

    print(headers)  # noqa: T001
    # {'Content-Length': '321322', 'Content-Type': 'text/html; charset=UTF-8', ... }

    print(status)  # noqa: T001
    # 200

    save_result = scraper.save('/tmp/my-data.json')  # noqa: S108
    if save_result.get('error', None):
        # save error occured...
        # add you code here...
        pass
    print(save_result)  # noqa: T001
    # {'file': '/tmp/my-data.json', 'size': 174449}

License

This project is licensed under MIT


Contributer(s)


Contribute

All PR’s are welcome!

  1. fork (https://github.com/promptapi/scraper-py/fork)
  2. Create your branch (git checkout -b my-feature)
  3. commit yours (git commit -am 'Add awesome features...')
  4. push your branch (git push origin my-feature)
  5. Than create a new Pull Request!

This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pa-scraper-0.2.1.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

pa_scraper-0.2.1-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file pa-scraper-0.2.1.tar.gz.

File metadata

  • Download URL: pa-scraper-0.2.1.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.4

File hashes

Hashes for pa-scraper-0.2.1.tar.gz
Algorithm Hash digest
SHA256 fb0948fdf72cf28639aaf2da057d7b16dfdd3a926f9d19bfc1caf4e579c8c1fc
MD5 a6c41cbbba2b4745947d52a06e67421e
BLAKE2b-256 6cd88315742abbe880a597befa2a4a897c82c9f636405d4fb322d20e975b22b5

See more details on using hashes here.

File details

Details for the file pa_scraper-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: pa_scraper-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.4

File hashes

Hashes for pa_scraper-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3fa23a28d49fe1b95979d3e796466acc3d9fc5a6e7abc2dbc8f36bb133dfc544
MD5 62186d5f1a4af280d2727416974c9180
BLAKE2b-256 c75d90cdbaa908e47aef32b1df21fbfc02b4d2aa7c8684b091564e7964d0b2b5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page