Skip to main content

Async client for the webscrapper API

Project description

Webscrapper Client API

An asynchronous and synchronous Python client for the Webscrapper API service. This client provides methods for retrieving web pages through proxies and checking URLs against the Russian internet regulator (RKN).

Features

  • Fully asynchronous API built with aiohttp
  • Support for both regular HTTP and Selenium-based web scraping
  • Cookie management for both HTTP and Selenium requests
  • Custom user agent and referer support
  • Mobile and country-specific proxy support
  • RKN checking functionality
  • Context manager support for proper resource management

Installation

pip install webscrapper-client-api

Or install directly from the repository:

pip install git+https://github.com/yourusername/webscrapper-client-api.git

Usage

Basic Example

import asyncio
from webscrapper_client_api import WebscrapperClientAPIAsync

async def main():
    async with WebscrapperClientAPIAsync("your_api_key") as client:
        # Basic page retrieval
        result = await client.get_page(url="https://example.com")
        print(f"Status: {result['status_code']}")
        print(f"Content length: {len(result['html'])}")
        
        # RKN check
        rkn_result = await client.check_rkn(url="https://example.com")
        print(f"RKN check result: {rkn_result}")

if __name__ == "__main__":
    asyncio.run(main())

Using Cookies with Selenium

async with WebscrapperClientAPIAsync("your_api_key") as client:
    # Define cookies for Selenium
    cookies = [
        {"name": "session_id", "value": "abc123"},
        {"name": "user_preferences", "value": "dark_mode=1"}
    ]
    
    # Request with Selenium and cookies
    result = await client.get_page(
        url="https://example.com/login",
        use_selenium=True,
        cookies=cookies
    )

Using Cookies with Regular HTTP

async with WebscrapperClientAPIAsync("your_api_key") as client:
    # Define cookies for HTTP request
    cookies = {
        "session_id": "abc123",
        "user_preferences": "dark_mode=1"
    }
    
    # Request with HTTP and cookies
    result = await client.get_page(
        url="https://example.com/dashboard",
        cookies=cookies,
        user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
        referer="https://example.com/login"
    )

Manual Session Management

async def example():
    # Create client
    client = WebscrapperClientAPIAsync("https://fetch.webnova.one", "your_api_key")
    
    try:
        # Make requests
        result = await client.get_page(url="https://example.com")
    finally:
        # Always close the session when done
        await client.close()

API Methods

get_page

Retrieves a web page through a proxy.

Parameters:

  • url (str): URL to retrieve
  • use_selenium (bool, optional): Use Selenium for request. Default: False
  • use_mobile (bool, optional): Use mobile proxy. Default: False
  • user_agent (str, optional): Custom User-Agent header
  • referer (str, optional): Custom referer (not used for Selenium)
  • method (str, optional): Request method, 'get' or 'head'. Default: 'get'
  • country (int, optional): Proxy country ID
  • cookies (dict or list, optional): Cookies to send with the request

Returns a dictionary with:

  • html: HTML content of the page
  • status_code: HTTP status code
  • url: Final URL (may differ from requested URL after redirects)
  • error: Error message if any
  • selenium: Boolean indicating if Selenium was used (only in Selenium responses)

check_rkn

Checks if a domain is blocked by RKN (Russian internet regulator).

Parameters:

  • url (str): URL to check

Returns a dictionary with the RKN check results.

Exception Handling

The client defines a custom exception WebscrapperAPIError for handling API errors:

try:
    result = await client.get_page(url="https://example.com")
except WebscrapperAPIError as e:
    print(f"API Error: {e.message}, Status code: {e.status_code}")

License

This project is licensed under the WTFPL License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webscrapper_client_api-0.1.1.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

webscrapper_client_api-0.1.1-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file webscrapper_client_api-0.1.1.tar.gz.

File metadata

  • Download URL: webscrapper_client_api-0.1.1.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for webscrapper_client_api-0.1.1.tar.gz
Algorithm Hash digest
SHA256 956e9a0e0c1b384d09b4532955bb0fd7fcd03c2da15c449dac8d08359fd6c9ff
MD5 1565d5aa1862376b67e501199f5cadb0
BLAKE2b-256 45b7147248cbaeb2fcbdeebbd61bbf3feb6641a1a848af273161ff6a73e8be03

See more details on using hashes here.

File details

Details for the file webscrapper_client_api-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for webscrapper_client_api-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a6fde8241e3f563f70f0f6d32995ee8d723f6ed56f50b64ef885b59eaa492345
MD5 67fef72d3adf0cc4ab8a9eef938e4d1b
BLAKE2b-256 1e742ba42890c1b71bc582692e6d8c6e96b02f9cc72edd92330d6e6ac7173571

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page