Skip to main content

Async client for the webscrapper API

Project description

Webscrapper Client API

An asynchronous Python client for the Webscrapper API service. This client provides methods for retrieving web pages through proxies and checking URLs against the Russian internet regulator (RKN).

Features

  • Fully asynchronous API built with aiohttp
  • Support for both regular HTTP and Selenium-based web scraping
  • Cookie management for both HTTP and Selenium requests
  • Custom user agent and referer support
  • Mobile and country-specific proxy support
  • RKN checking functionality
  • Context manager support for proper resource management

Installation

pip install webscrapper-client-api

Or install directly from the repository:

pip install git+https://github.com/yourusername/webscrapper-client-api.git

Usage

Basic Example

import asyncio
from webscrapper_client_api import WebscrapperClientAPI

async def main():
    async with WebscrapperClientAPI("https://fetch.webnova.one", "your_api_key") as client:
        # Basic page retrieval
        result = await client.get_page(url="https://example.com")
        print(f"Status: {result['status_code']}")
        print(f"Content length: {len(result['html'])}")
        
        # RKN check
        rkn_result = await client.check_rkn(url="https://example.com")
        print(f"RKN check result: {rkn_result}")

if __name__ == "__main__":
    asyncio.run(main())

Using Cookies with Selenium

async with WebscrapperClientAPI("https://fetch.webnova.one", "your_api_key") as client:
    # Define cookies for Selenium
    cookies = [
        {"name": "session_id", "value": "abc123"},
        {"name": "user_preferences", "value": "dark_mode=1"}
    ]
    
    # Request with Selenium and cookies
    result = await client.get_page(
        url="https://example.com/login",
        use_selenium=True,
        cookies=cookies
    )

Using Cookies with Regular HTTP

async with WebscrapperClientAPI("https://fetch.webnova.one", "your_api_key") as client:
    # Define cookies for HTTP request
    cookies = {
        "session_id": "abc123",
        "user_preferences": "dark_mode=1"
    }
    
    # Request with HTTP and cookies
    result = await client.get_page(
        url="https://example.com/dashboard",
        cookies=cookies,
        user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
        referer="https://example.com/login"
    )

Manual Session Management

async def example():
    # Create client
    client = WebscrapperClientAPI("https://fetch.webnova.one", "your_api_key")
    
    try:
        # Make requests
        result = await client.get_page(url="https://example.com")
    finally:
        # Always close the session when done
        await client.close()

API Methods

get_page

Retrieves a web page through a proxy.

Parameters:

  • url (str): URL to retrieve
  • use_selenium (bool, optional): Use Selenium for request. Default: False
  • use_mobile (bool, optional): Use mobile proxy. Default: False
  • user_agent (str, optional): Custom User-Agent header
  • referer (str, optional): Custom referer (not used for Selenium)
  • method (str, optional): Request method, 'get' or 'head'. Default: 'get'
  • country (int, optional): Proxy country ID
  • cookies (dict or list, optional): Cookies to send with the request

Returns a dictionary with:

  • html: HTML content of the page
  • status_code: HTTP status code
  • url: Final URL (may differ from requested URL after redirects)
  • error: Error message if any
  • selenium: Boolean indicating if Selenium was used (only in Selenium responses)

check_rkn

Checks if a domain is blocked by RKN (Russian internet regulator).

Parameters:

  • url (str): URL to check

Returns a dictionary with the RKN check results.

Exception Handling

The client defines a custom exception WebscrapperAPIError for handling API errors:

try:
    result = await client.get_page(url="https://example.com")
except WebscrapperAPIError as e:
    print(f"API Error: {e.message}, Status code: {e.status_code}")

License

This project is licensed under the WTFPL License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webscrapper_client_api-0.1.0.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

webscrapper_client_api-0.1.0-py3-none-any.whl (5.0 kB view details)

Uploaded Python 3

File details

Details for the file webscrapper_client_api-0.1.0.tar.gz.

File metadata

  • Download URL: webscrapper_client_api-0.1.0.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for webscrapper_client_api-0.1.0.tar.gz
Algorithm Hash digest
SHA256 36c4c8924de512fe03bf9b28e09e8231355967f71558a247330c42b81d5a006c
MD5 15cee0d0565332107cacafcb3abd22cf
BLAKE2b-256 40ff131b58f13a3de14e7fcb57245be2705d6fbf2964f02bb7b7f5a857e53fd7

See more details on using hashes here.

File details

Details for the file webscrapper_client_api-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for webscrapper_client_api-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 61706f6a37c7bba160154558b6fec76f59dc2693394b1dafa9cda49b0e1fa5b2
MD5 c0f9b201b8ac00b5bb44c6235227714f
BLAKE2b-256 9b8106bf27412f9e5ab96484f9391f57fc530bc965b911a07399e3d0e037da2b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page