Skip to main content

Rate Limiting and Caching HTTPX Client

Project description

HTTPX Wrapper with Rate Limiting and Caching Transports.

PyPI Version Python Versions

BuildRelease Tests Coverage badge

Introduction

The goal of this project is a combination of convenience and as a demonstration of how to assemble HTTPX Transports in different combinations.

HTTP request libraries, rate limiting and caching are topics with deep rabbit holes: the technical implementation details & the decisions an end user has to make. The convenience part of this package is abstracting away certain decisions and making certain opinionated decisions as to how caching & rate limiting should be controlled.

This came about while implementing caching & rate limiting for edgartools: reducing network requests and improving overall performance led to a myriad of decisions. The SEC's Edgar site has a strict 10 request per second limit, while providing not-very-helpful caching headers. Overriding these caching headers with custom rules is necessary in certain cases.

Caching

This project provides four cache_mode options:

  • Disabled: Rate Limiting only
  • Hishel-File: Cache using Hishel using FileStorage
  • Hishel-S3: Cache using Hishel using S3Storage
  • FileCache: Use a simpler filecache backend that uses file modified and created time and only revalidates using last-modified. For sites where last-modified is provided.

Cache Rules are defined as a dictionary of site regular expressions to path regular expressions.

{
    'site_regex': {
        'url_regex': duration,
        'url_regex2': duration,
        '.*': 3600, # cache all paths for this site for an hour
    }
}

Misc Settings:

  • HTTPS_PROXY: HTTPS_PROXY environment variable is propagated to the HTTPX Transport

Usage: Synchronous Requests

Note that the Manager object is intended to be long lived, doesn't need to be used as a context manager.

from httpxthrottlecache import HttpxThrottleCache

url = "https://httpbingo.org/get"

with HttpxThrottleCache(cache_mode="Hishel-File", 
    cache_dir = "_cache", 
    rate_limiter_enabled=True, 
    request_per_sec_limit=10, 
    user_agent="your user agent") as manager:

    # Single synchronous request
    with manager.http_client() as client:
        response = client.get(url)
        print(response.status_code)

Usage: Batch Requests

from httpxthrottlecache import HttpxThrottleCache

url = "https://httpbingo.org/get"

with HttpxThrottleCache(cache_mode="Hishel-File", 
    cache_dir = "_cache", 
    rate_limiter_enabled=True, 
    request_per_sec_limit=10, 
    user_agent="your user agent") as manager:

# Batch request
responses = manager.get_batch([url,url])
print([r[0] for r in responses])

Usage: Retrieve many files and write to files

from pathlib import Path
from httpxthrottlecache import HttpxThrottleCache

with HttpxThrottleCache(cache_mode="Disabled") as mgr:
    urls = {f"https://httpbingo.org/get?{i}": Path(f"file{i}") for i in range(10)}
    results = mgr.get_batch(urls=urls)

Usage: Asynchronous

from httpxthrottlecache import HttpxThrottleCache
import asyncio 

url = "https://httpbingo.org/get"
with HttpxThrottleCache(cache_mode="Hishel-File", 
    cache_dir = "_cache", 
    rate_limiter_enabled=True, 
    request_per_sec_limit=10) as manager:

    # Async request
    async with manager.async_http_client() as client:
        tasks = [client.get(url) for _ in range(2)]
        responses = await asyncio.gather(*tasks)
        print(responses)

FileCache

The FileCache implementation ignores response caching headers. Instead, it treats data as "fresh" for a client-provided max age. The max age is defined in a cacherule, as defined above.

Once the max age is expired, the FileCache Transport will revalidate the data using the Last-Modified date. TODO: Revalidate using ETAG as well.

The FileCache implementation stores files as the raw bytes plus a .meta sidecar. The .meta provides headers, such as Last-Modified, which are used for revalidation. The raw bytes are in the native format - binary files are in their native format, compressed gzip streams are stored as compressed gzip data, etc.

FileCache uses FileLock to ensure only one writer to a cached object. This means that (currently) multiple simultaneous cache misses will stack up waiting to write to file. This locking is intended mainly to allow multiple processes to share the same cache.

FileCache initially stages data to a .tmp file, then upon completion, copies to the final file.

No cache cleanup is done - that's your problem.

Rate Limiting

Rate limiting is implemented via pyrate_limiter. This is a leaky bucket implementation that allows a configurable number of requests per time interval.

pyrate_limiter supports a variable of backends. The default backend is in-memory, and a single Limiter can be used for both sync and asyncio requests, across multiple threads. Alternative limiters can be used for multiprocess and distributed rate limiting, see examples for more.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

httpxthrottlecache-0.3.0.tar.gz (14.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

httpxthrottlecache-0.3.0-py3-none-any.whl (13.1 kB view details)

Uploaded Python 3

File details

Details for the file httpxthrottlecache-0.3.0.tar.gz.

File metadata

  • Download URL: httpxthrottlecache-0.3.0.tar.gz
  • Upload date:
  • Size: 14.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for httpxthrottlecache-0.3.0.tar.gz
Algorithm Hash digest
SHA256 5b3ea1c42accd47685820ca1f94eee2df87b7400504e2dc5d22084f68a0ac9b6
MD5 facc66c726dea976e7d48235d8771c9d
BLAKE2b-256 dcafd742a51cf3b3efd2ac91168190cb5ec893c614f1b4f18d615bd2cdc63369

See more details on using hashes here.

Provenance

The following attestation bundles were made for httpxthrottlecache-0.3.0.tar.gz:

Publisher: build_deploy.yml on paultiq/httpxthrottlecache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file httpxthrottlecache-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for httpxthrottlecache-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5774b23f464cebadbbd78286c70df8866bc9396449f53411045f8cae496cab3a
MD5 738ec856d3948177b9e240134cb93c75
BLAKE2b-256 f3d508f7afc4adcef8c5391a6012e0dca44e14df8c7d421b9c8982746a553eb7

See more details on using hashes here.

Provenance

The following attestation bundles were made for httpxthrottlecache-0.3.0-py3-none-any.whl:

Publisher: build_deploy.yml on paultiq/httpxthrottlecache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page