Skip to main content

Official python client for the ScrapingAnt API.

Project description

ScrapingAnt API client for Python

PyPI version

scrapingant-client is the official library to access ScrapingAnt API from your Python applications. It provides useful features like parameters encoding to improve the ScrapingAnt usage experience. Requires python 3.6+.

Quick Start

from scrapingant_client import ScrapingAntClient

client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')
# Scrape the example.com site.
result = client.general_request('https://example.com')
print(result.content)

Install

pip install scrapingant-client

If you need async support:

pip install scrapingant-client[async]

API token

In order to get API token you'll need to register at ScrapingAnt Service

API Reference

All public classes, methods and their parameters can be inspected in this API reference.

ScrapingAntClient(token)

Main class of this library.

Param Type
token string

ScrapingAntClient.general_request and ScrapingAntClient.general_request_async

https://docs.scrapingant.com/request-response-format#available-parameters

Param Type Default
url string
cookies List[Cookie] None
headers List[Dict[str, str]] None
js_snippet string None
proxy_type ProxyType datacenter
proxy_country str None
return_text boolean False
wait_for_selector str None
browser boolean True

IMPORTANT NOTE: js_snippet will be encoded to Base64 automatically by the ScrapingAnt client library.


Cookie

Class defining cookie. Currently it supports only name and value

Param Type
name string
value string

Response

Class defining response from API.

Param Type
content string
cookies List[Cookie]
status_code int

Exceptions

ScrapingantClientException is base Exception class, used for all errors.

Exception Reason
ScrapingantInvalidTokenException The API token is wrong or you have exceeded the API calls request limit
ScrapingantInvalidInputException Invalid value provided. Please, look into error message for more info
ScrapingantInternalException Something went wrong with the server side code. Try again later or contact ScrapingAnt support
ScrapingantSiteNotReachableException The requested URL is not reachable. Please, check it locally
ScrapingantDetectedException The anti-bot detection system has detected the request. Please, retry or change the request settings.

Examples

Sending custom cookies

from scrapingant_client import ScrapingAntClient
from scrapingant_client import Cookie

client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')

result = client.general_request(
    'https://httpbin.org/cookies',
    cookies=[
        Cookie(name='cookieName1', value='cookieVal1'),
        Cookie(name='cookieName2', value='cookieVal2'),
    ]
)
print(result.content)
# Response cookies is a list of Cookie objects
# They can be used in next requests
response_cookies = result.cookies 

Executing custom JS snippet

from scrapingant_client import ScrapingAntClient

client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')

customJsSnippet = """
var str = 'Hello, world!';
var htmlElement = document.getElementsByTagName('html')[0];
htmlElement.innerHTML = str;
"""
result = client.general_request(
    'https://example.com',
    js_snippet=customJsSnippet,
)
print(result.content)

Exception handling and retries

from scrapingant_client import ScrapingAntClient, ScrapingantClientException, ScrapingantInvalidInputException

client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')

RETRIES_COUNT = 3


def parse_html(html: str):
    ...  # Implement your data extraction here


parsed_data = None
for retry_number in range(RETRIES_COUNT):
    try:
        scrapingant_response = client.general_request(
            'https://example.com',
        )
    except ScrapingantInvalidInputException as e:
        print(f'Got invalid input exception: {{repr(e)}}')
        break  # We are not retrying if request params are not valid
    except ScrapingantClientException as e:
        print(f'Got ScrapingAnt exception {repr(e)}')
    except Exception as e:
        print(f'Got unexpected exception {repr(e)}')  # please report this kind of exceptions by creating a new issue
    else:
        try:
            parsed_data = parse_html(scrapingant_response.content)
            break  # Data is parsed successfully, so we dont need to retry
        except Exception as e:
            print(f'Got exception while parsing data {repr(e)}')

if parsed_data is None:
    print(f'Failed to retrieve and parse data after {RETRIES_COUNT} tries')
    # Can sleep and retry later, or stop the script execution, and research the reason 
else:
    print(f'Successfully parsed data: {parsed_data}')

Sending custom headers

from scrapingant_client import ScrapingAntClient

client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')

result = client.general_request(
    'https://httpbin.org/headers',
    headers={
        'test-header': 'test-value'
    }
)
print(result.content)

# Http basic auth example
result = client.general_request(
    'https://jigsaw.w3.org/HTTP/Basic/',
    headers={'Authorization': 'Basic Z3Vlc3Q6Z3Vlc3Q='}
)
print(result.content)

Simple async example

import asyncio

from scrapingant_client import ScrapingAntClient

client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')


async def main():
    # Scrape the example.com site.
    result = await client.general_request_async('https://example.com')
    print(result.content)


asyncio.run(main())

Useful links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapingant-client-1.0.0.tar.gz (7.5 kB view details)

Uploaded Source

Built Distribution

scrapingant_client-1.0.0-py3-none-any.whl (10.5 kB view details)

Uploaded Python 3

File details

Details for the file scrapingant-client-1.0.0.tar.gz.

File metadata

  • Download URL: scrapingant-client-1.0.0.tar.gz
  • Upload date:
  • Size: 7.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for scrapingant-client-1.0.0.tar.gz
Algorithm Hash digest
SHA256 7ee2fa5fc425b4f55e42238afaca57c0af00972e8cb1f1834e2996d9efc588c2
MD5 f2d90a2ace162d0c84681ae4dd316752
BLAKE2b-256 b18208dd0ef187b38585976f1b5c3b607a19a533c10bf8e80cf58cc7335572b6

See more details on using hashes here.

File details

Details for the file scrapingant_client-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapingant_client-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 54d4e0e4079053bc13bceedbdf4dc9167bbbb33a095033c8fd57836a1a413976
MD5 bf339a7a0779d12c257d5081bb21d827
BLAKE2b-256 795b4e41f7d0da5d88f82a7911d0d2f3a21109ed914568e99bd069419fe9d52d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page