Python wrapper for Prompt API's Scraper API
Project description
Prompt API - Scraper API - Python Package
pa-scraper
is a python wrapper for scraper api with few
more extra cream and sugar.
Requirements
- You need to signup for Prompt API
- You need to subscribe scraper api, test drive is free!!!
- You need to set
PROMPTAPI_TOKEN
environment variable after subscription.
then;
$ pip install pa-scraper
Example Usage
Examples can be found here.
from scraper import Scraper
url = 'https://pypi.org/classifiers/'
scraper = Scraper(url)
response = scraper.get()
if response.get('error', None):
# response['error'] returns error message
# response['status'] returns http status code
# {'error': 'Not Found', 'status': 404}
print(response)
else:
result = response['result']
print(result['headers']) # returns response headers
print(result['data']) # returns fetched html
print(result['url']) # returns fetched url
print(response['status']) # returns http status code
save_result = scraper.save('/tmp/my-html.html') # save to file
if save_result.get('error', None):
# we have save error
pass
else:
print(save_result) # contains saved file path and file size
# {'file': '/tmp/my-html.html', 'size': 321322}
You can add url parameters for extra operations. Valid parameters are:
auth_password
: for HTTP Realm auth passwordauth_username
: for HTTP Realm auth usernamecookie
: URL Encoded cookie header.country
: 2 character country code. If you wish to scrape from an IP address of a specific country.referer
: HTTP referer header
from scraper import Scraper
url = 'https://pypi.org/classifiers/'
scraper = Scraper(url)
fetch_params = dict(country='EE')
response = scraper.get(params=fetch_params)
if response.get('error', None):
# response['error'] returns error message
# response['status'] returns http status code
# {'error': 'Not Found', 'status': 404}
print(response)
else:
result = response['result']
status = response['status']
print(result['headers']) # returns response headers
print(result['data']) # returns fetched html
print(result['url']) # returns fetched url
print(response['status']) # returns http status code
save_result = scraper.save('/tmp/my-html.html') # save to file
if save_result.get('error', None):
# we have save error
pass
else:
print(save_result) # contains saved file path and file size
# {'file': '/tmp/my-html.html', 'size': 321322}
TODO
- Add
xpath
extractor.
License
This project is licensed under MIT
Contributer(s)
- Prompt API - Creator, maintainer
Contribute
All PR’s are welcome!
fork
(https://github.com/promptapi/scraper-py/fork)- Create your
branch
(git checkout -b my-feature
) commit
yours (git commit -am 'Add awesome features...'
)push
yourbranch
(git push origin my-feature
)- Than create a new Pull Request!
This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pa-scraper-0.1.2.tar.gz
(4.9 kB
view details)
Built Distribution
File details
Details for the file pa-scraper-0.1.2.tar.gz
.
File metadata
- Download URL: pa-scraper-0.1.2.tar.gz
- Upload date:
- Size: 4.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
2cb449ade5d2c68774805a1753360b5325bff6c7decd6275a3672b6dc29f0898
|
|
MD5 |
f8ef835346e1a1a0f977b79fb42a791d
|
|
BLAKE2b-256 |
7da0cfb5f099cb93994f6a5c03fea32f40ef7641301e59a15330275aa4ad7214
|
File details
Details for the file pa_scraper-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: pa_scraper-0.1.2-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
2d1212afa74c3ce0fc06298ff18453b701925ae2886b73a80dc6b32d9bba8de6
|
|
MD5 |
0e98ff37b92f1ca10542a77267454b23
|
|
BLAKE2b-256 |
1180f166bfaf52dd7b9e56c88d43145802a480b2b7f79d525d14df2b8d554a13
|