Helper classes to read files over HTTP using Range requests, with caching

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

jkeifer

Project description

hctef

Python library with helper classes to read files over HTTP using Range requests, with caching.

Overview

hctef provides a file-like interface for reading files over HTTP/HTTPS, using HTTP Range requests to fetch only the data you need. It includes intelligent caching to minimize network requests and supports both synchronous and asynchronous operations.

Features

File-like API: Works like a regular Python file object with read(), seek(), and tell() methods
Efficient Range Requests: Fetches only the data you need using HTTP Range headers
Intelligent Caching: Uses an interval tree to track cached byte ranges and minimize redundant requests
Prefetching: Optionally prefetch data from the start or end of the file
Sync and Async: Both synchronous and asynchronous implementations available
Context Manager Support: Use with with statements for automatic cleanup

Installation

pip install hctef

To include async support:

pip install hctef[async]

Quick Start

Synchronous Usage

from hctef import HttpFile

url = "https://example.com/large-file.bin"

with HttpFile(url) as f:
    # Read first 100 bytes
    data = f.read(100)

    # Seek to a specific position
    f.seek(1000)

    # Read from current position
    more_data = f.read(50)

    # Get current position
    position = f.tell()

    # Seek relative to end of file
    f.seek(-100, 2)

Asynchronous Usage

The async implementation supports independent cursors for concurrent reads:

import asyncio
from hctef.aio import AsyncHttpFile

url = "https://example.com/large-file.bin"

async with AsyncHttpFile(url) as f:
    # Read first 100 bytes
    data = await f.read(100)

    # Seek to a specific position (synchronous - no I/O)
    f.seek(1000)

    # Read from current position
    more_data = await f.read(50)

Parallel Reads with Multiple Cursors

Create independent cursors to read from different positions concurrently:

import asyncio
from hctef.aio import AsyncHttpFile

url = "https://example.com/large-file.bin"

async with AsyncHttpFile(url) as f:
    # Create independent cursors for parallel reading
    cursor1 = f.clone()
    cursor2 = f.clone()

    # Position each cursor at different locations
    f.seek(0)
    cursor1.seek(1000)
    cursor2.seek(2000)

    # Read from all three positions in parallel
    # All cursors share the same cache and HTTP session
    results = await asyncio.gather(
        f.read(100),        # Read bytes 0-100
        cursor1.read(100),  # Read bytes 1000-1100
        cursor2.read(100),  # Read bytes 2000-2100
    )

    # Each cursor maintains independent position
    print(f.tell())        # 100
    print(cursor1.tell())  # 1100
    print(cursor2.tell())  # 2100

Cursors are lightweight and share:

HTTP session (connection pooling)
Byte range cache (deduplication of overlapping requests)
File metadata

Configuration Options

Both HttpFile and AsyncHttpFile accept the following parameters:

HttpFile(
    url,
    minimum_range_request_bytes=8192,  # Minimum bytes per request (default: 8KB)
    prefetch_bytes=1048576,             # Bytes to prefetch on open (default: 1MB)
    prefetch_direction='END'            # 'START' or 'END' (default: 'END')
)

minimum_range_request_bytes: The minimum number of bytes to request in a single HTTP Range request (except when filling small cache gaps)
prefetch_bytes: How many bytes to fetch immediately when opening the file. Set to 0 to disable prefetching
prefetch_direction: Whether to prefetch from the start ('START') or end ('END') of the file

Requirements

Python 3.12 or higher
HTTP server must support Range requests
For async: aiohttp>=3.13.0

How It Works

When you open an HTTP file, hctef:

Sends an initial Range request to determine the file size and verify Range support
Optionally prefetches data from the start or end of the file
Maintains an in-memory cache of fetched byte ranges (not suitable for downloading complete large files)
On read(), checks the cache first and only fetches missing data from the server
Combines multiple small requests into larger ones based on minimum_range_request_bytes

This approach minimizes HTTP requests while providing efficient random access to remote files.

Error Handling

hctef defines custom exceptions:

HctefError: Base exception class
HctefNetworkError: Raised for network-related errors (inherits from IOError)
HctefUrlError: Raised for invalid URLs (inherits from ValueError)

from hctef import HttpFile
from hctef.exceptions import HctefNetworkError, HctefUrlError

try:
    with HttpFile("https://example.com/file.bin") as f:
        data = f.read(100)
except HctefNetworkError as e:
    print(f"Network error: {e}")
except HctefUrlError as e:
    print(f"Invalid URL: {e}")

Development

To set up for development:

# Clone the repository
git clone https://github.com/jkeifer/hctef
cd hctef

# Install dependencies
uv sync --all-extras --dev

# Setup pre-commit
pre-commit install

# Run tests
pytest

# Run all checks with pre-commit
pre-commit run --all-files

Future Ideas

Consoldiate sync/async implementations
Allow uncached "cursor" for reading a large file segement
Cursors with separate caches (to allow clearing memory when done)
- would allow cursor-based access with non-async implementation

License

Apache License 2.0

What is hctef?

It's the HTTP Client That Eats Files, obviously.

Project details

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

jkeifer

Release history Release notifications | RSS feed

This version

0.1.1

Nov 16, 2025

0.1.0

Oct 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hctef-0.1.1.tar.gz (77.3 kB view details)

Uploaded Nov 16, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hctef-0.1.1-py3-none-any.whl (15.3 kB view details)

Uploaded Nov 16, 2025 Python 3

File details

Details for the file hctef-0.1.1.tar.gz.

File metadata

Download URL: hctef-0.1.1.tar.gz
Upload date: Nov 16, 2025
Size: 77.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hctef-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`c720930343c031e50ff97280dcc2c9f8d22db8aa6b2b900784b6a6a29ed39032`
MD5	`930bba03e231010710d507e7682f14af`
BLAKE2b-256	`1d8c9d4871a2fb7715dd6081386c5ffd8d08c9fed26a5ff38bc00a74b171bd3e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for hctef-0.1.1.tar.gz:

Publisher: release.yml on jkeifer/hctef

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: hctef-0.1.1.tar.gz
- Subject digest: c720930343c031e50ff97280dcc2c9f8d22db8aa6b2b900784b6a6a29ed39032
- Sigstore transparency entry: 702681917
- Sigstore integration time: Nov 16, 2025
Source repository:
- Permalink: jkeifer/hctef@ef46ef44484fde4621becd33f92b21a2d3a786db
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/jkeifer
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@ef46ef44484fde4621becd33f92b21a2d3a786db
- Trigger Event: release

File details

Details for the file hctef-0.1.1-py3-none-any.whl.

File metadata

Download URL: hctef-0.1.1-py3-none-any.whl
Upload date: Nov 16, 2025
Size: 15.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hctef-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cf833f23c596be844fc673dc414ce1b63e977bbb506ff757576b9746173b4e00`
MD5	`9dd76bc9c834b7d83905053756027d28`
BLAKE2b-256	`aead3666c2f966a429a9c1268967260ece0dd94bdcb31b29fa62fba479358f46`

See more details on using hashes here.

Provenance

The following attestation bundles were made for hctef-0.1.1-py3-none-any.whl:

Publisher: release.yml on jkeifer/hctef

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: hctef-0.1.1-py3-none-any.whl
- Subject digest: cf833f23c596be844fc673dc414ce1b63e977bbb506ff757576b9746173b4e00
- Sigstore transparency entry: 702681947
- Sigstore integration time: Nov 16, 2025
Source repository:
- Permalink: jkeifer/hctef@ef46ef44484fde4621becd33f92b21a2d3a786db
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/jkeifer
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@ef46ef44484fde4621becd33f92b21a2d3a786db
- Trigger Event: release

hctef 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

hctef

Overview

Features

Installation

Quick Start

Synchronous Usage

Asynchronous Usage

Parallel Reads with Multiple Cursors

Configuration Options

Requirements

How It Works

Error Handling

Development

Future Ideas

License

What is hctef?

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance