Official python client for the ScrapingAnt API.
Project description
ScrapingAnt API client for Python
scrapingant-client
is the official library to access ScrapingAnt API from your Python
applications. It provides useful features like parameters encoding to improve the ScrapingAnt usage experience. Requires
python 3.6+.
Quick Start
from scrapingant_client import ScrapingAntClient
client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')
# Scrape the example.com site
result = client.general_request('https://example.com')
print(result.content)
Install
pip install scrapingant-client
If you need async support:
pip install scrapingant-client[async]
API token
In order to get API token you'll need to register at ScrapingAnt Service
API Reference
All public classes, methods and their parameters can be inspected in this API reference.
ScrapingAntClient(token)
Main class of this library.
Param | Type |
---|---|
token | string |
Common arguments
- ScrapingAntClient.general_request
- ScrapingAntClient.general_request_async
- ScrapingAntClient.markdown_request
- ScrapingAntClient.markdown_request_async
https://docs.scrapingant.com/request-response-format#available-parameters
Param | Type | Default |
---|---|---|
url | string |
|
method | string |
GET |
cookies | List[Cookie] |
None |
headers | List[Dict[str, str]] |
None |
js_snippet | string |
None |
proxy_type | ProxyType |
datacenter |
proxy_country | str |
None |
wait_for_selector | str |
None |
browser | boolean |
True |
return_page_source | boolean |
False |
data | same as requests param 'data' | None |
json | same as requests param 'json' | None |
IMPORTANT NOTE: js_snippet
will be encoded to Base64 automatically by the ScrapingAnt client library.
Cookie
Class defining cookie. Currently it supports only name and value
Param | Type |
---|---|
name | string |
value | string |
Response
Class defining response from API.
Param | Type |
---|---|
content | string |
cookies | List[Cookie] |
status_code | int |
text | string |
Exceptions
ScrapingantClientException
is base Exception class, used for all errors.
Exception | Reason |
---|---|
ScrapingantInvalidTokenException | The API token is wrong or you have exceeded the API calls request limit |
ScrapingantInvalidInputException | Invalid value provided. Please, look into error message for more info |
ScrapingantInternalException | Something went wrong with the server side code. Try again later or contact ScrapingAnt support |
ScrapingantSiteNotReachableException | The requested URL is not reachable. Please, check it locally |
ScrapingantDetectedException | The anti-bot detection system has detected the request. Please, retry or change the request settings. |
ScrapingantTimeoutException | Got timeout while communicating with Scrapingant servers. Check your network connection. Please try later or contact support |
Examples
Sending custom cookies
from scrapingant_client import ScrapingAntClient
from scrapingant_client import Cookie
client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')
result = client.general_request(
'https://httpbin.org/cookies',
cookies=[
Cookie(name='cookieName1', value='cookieVal1'),
Cookie(name='cookieName2', value='cookieVal2'),
]
)
print(result.content)
# Response cookies is a list of Cookie objects
# They can be used in next requests
response_cookies = result.cookies
Executing custom JS snippet
from scrapingant_client import ScrapingAntClient
client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')
customJsSnippet = """
var str = 'Hello, world!';
var htmlElement = document.getElementsByTagName('html')[0];
htmlElement.innerHTML = str;
"""
result = client.general_request(
'https://example.com',
js_snippet=customJsSnippet,
)
print(result.content)
Exception handling and retries
from scrapingant_client import ScrapingAntClient, ScrapingantClientException, ScrapingantInvalidInputException
client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')
RETRIES_COUNT = 3
def parse_html(html: str):
... # Implement your data extraction here
parsed_data = None
for retry_number in range(RETRIES_COUNT):
try:
scrapingant_response = client.general_request(
'https://example.com',
)
except ScrapingantInvalidInputException as e:
print(f'Got invalid input exception: {{repr(e)}}')
break # We are not retrying if request params are not valid
except ScrapingantClientException as e:
print(f'Got ScrapingAnt exception {repr(e)}')
except Exception as e:
print(f'Got unexpected exception {repr(e)}') # please report this kind of exceptions by creating a new issue
else:
try:
parsed_data = parse_html(scrapingant_response.content)
break # Data is parsed successfully, so we dont need to retry
except Exception as e:
print(f'Got exception while parsing data {repr(e)}')
if parsed_data is None:
print(f'Failed to retrieve and parse data after {RETRIES_COUNT} tries')
# Can sleep and retry later, or stop the script execution, and research the reason
else:
print(f'Successfully parsed data: {parsed_data}')
Sending custom headers
from scrapingant_client import ScrapingAntClient
client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')
result = client.general_request(
'https://httpbin.org/headers',
headers={
'test-header': 'test-value'
}
)
print(result.content)
# Http basic auth example
result = client.general_request(
'https://jigsaw.w3.org/HTTP/Basic/',
headers={'Authorization': 'Basic Z3Vlc3Q6Z3Vlc3Q='}
)
print(result.content)
Simple async example
import asyncio
from scrapingant_client import ScrapingAntClient
client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')
async def main():
# Scrape the example.com site
result = await client.general_request_async('https://example.com')
print(result.content)
asyncio.run(main())
Sending POST request
from scrapingant_client import ScrapingAntClient
client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')
# Sending POST request with json data
result = client.general_request(
url="https://httpbin.org/post",
method="POST",
json={"test": "test"},
)
print(result.content)
# Sending POST request with bytes data
result = client.general_request(
url="https://httpbin.org/post",
method="POST",
data=b'test_bytes',
)
print(result.content)
Receiving markdown
from scrapingant_client import ScrapingAntClient
client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')
# Sending POST request with json data
result = client.markdown_request(
url="https://example.com",
)
print(result.markdown)
Useful links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scrapingant-client-2.1.0.tar.gz
.
File metadata
- Download URL: scrapingant-client-2.1.0.tar.gz
- Upload date:
- Size: 8.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d1aff60b5782623830f736665cdc0215fd8b7f9da6a4876ee80f2b1cbdddcd5 |
|
MD5 | d29378e8b5cb6971024b35bca7a69bb1 |
|
BLAKE2b-256 | e83602a3e185f29625a0be066c0935d9678708a5405fb2fa3583e83b78472fe0 |
File details
Details for the file scrapingant_client-2.1.0-py3-none-any.whl
.
File metadata
- Download URL: scrapingant_client-2.1.0-py3-none-any.whl
- Upload date:
- Size: 11.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bec3688dfa12a6ddb478c5dae9065ad4e9f120915f6209de9ace76ac3533c3f2 |
|
MD5 | faa728f5a08d2f4dcf9fd99c9b34df78 |
|
BLAKE2b-256 | fcf9a1efcabf555e08173d8f4cb8162095f544e9d6a25cb216d923ffeb99978e |