Helper classes to read files over HTTP using Range requests, with caching
Project description
hctef
Python library with helper classes to read files over HTTP using Range requests, with caching.
Overview
hctef provides a file-like interface for reading files over HTTP/HTTPS, using
HTTP Range requests to fetch only the data you need. It includes intelligent
caching to minimize network requests and supports both synchronous and
asynchronous operations.
Features
- File-like API: Works like a regular Python file object with
read(),seek(), andtell()methods - Efficient Range Requests: Fetches only the data you need using HTTP Range headers
- Intelligent Caching: Uses an interval tree to track cached byte ranges and minimize redundant requests
- Prefetching: Optionally prefetch data from the start or end of the file
- Sync and Async: Both synchronous and asynchronous implementations available
- Context Manager Support: Use with
withstatements for automatic cleanup
Installation
pip install hctef
To include async support:
pip install hctef[async]
Quick Start
Synchronous Usage
from hctef import HttpFile
url = "https://example.com/large-file.bin"
with HttpFile(url) as f:
# Read first 100 bytes
data = f.read(100)
# Seek to a specific position
f.seek(1000)
# Read from current position
more_data = f.read(50)
# Get current position
position = f.tell()
# Seek relative to end of file
f.seek(-100, 2)
Asynchronous Usage
The async implementation supports independent cursors for concurrent reads:
import asyncio
from hctef.aio import AsyncHttpFile
url = "https://example.com/large-file.bin"
async with AsyncHttpFile(url) as f:
# Read first 100 bytes
data = await f.read(100)
# Seek to a specific position (synchronous - no I/O)
f.seek(1000)
# Read from current position
more_data = await f.read(50)
Parallel Reads with Multiple Cursors
Create independent cursors to read from different positions concurrently:
import asyncio
from hctef.aio import AsyncHttpFile
url = "https://example.com/large-file.bin"
async with AsyncHttpFile(url) as f:
# Create independent cursors for parallel reading
cursor1 = f.clone()
cursor2 = f.clone()
# Position each cursor at different locations
f.seek(0)
cursor1.seek(1000)
cursor2.seek(2000)
# Read from all three positions in parallel
# All cursors share the same cache and HTTP session
results = await asyncio.gather(
f.read(100), # Read bytes 0-100
cursor1.read(100), # Read bytes 1000-1100
cursor2.read(100), # Read bytes 2000-2100
)
# Each cursor maintains independent position
print(f.tell()) # 100
print(cursor1.tell()) # 1100
print(cursor2.tell()) # 2100
Cursors are lightweight and share:
- HTTP session (connection pooling)
- Byte range cache (deduplication of overlapping requests)
- File metadata
Configuration Options
Both HttpFile and AsyncHttpFile accept the following parameters:
HttpFile(
url,
minimum_range_request_bytes=8192, # Minimum bytes per request (default: 8KB)
prefetch_bytes=1048576, # Bytes to prefetch on open (default: 1MB)
prefetch_direction='END' # 'START' or 'END' (default: 'END')
)
minimum_range_request_bytes: The minimum number of bytes to request in a single HTTP Range request (except when filling small cache gaps)prefetch_bytes: How many bytes to fetch immediately when opening the file. Set to 0 to disable prefetchingprefetch_direction: Whether to prefetch from the start ('START') or end ('END') of the file
Requirements
- Python 3.12 or higher
- HTTP server must support Range requests
- For async:
aiohttp>=3.13.0
How It Works
When you open an HTTP file, hctef:
- Sends an initial Range request to determine the file size and verify Range support
- Optionally prefetches data from the start or end of the file
- Maintains an in-memory cache of fetched byte ranges (not suitable for downloading complete large files)
- On
read(), checks the cache first and only fetches missing data from the server - Combines multiple small requests into larger ones based on
minimum_range_request_bytes
This approach minimizes HTTP requests while providing efficient random access to remote files.
Error Handling
hctef defines custom exceptions:
HctefError: Base exception classHctefNetworkError: Raised for network-related errors (inherits fromIOError)HctefUrlError: Raised for invalid URLs (inherits fromValueError)
from hctef import HttpFile
from hctef.exceptions import HctefNetworkError, HctefUrlError
try:
with HttpFile("https://example.com/file.bin") as f:
data = f.read(100)
except HctefNetworkError as e:
print(f"Network error: {e}")
except HctefUrlError as e:
print(f"Invalid URL: {e}")
Development
To set up for development:
# Clone the repository
git clone https://github.com/jkeifer/hctef
cd hctef
# Install dependencies
uv sync --all-extras --dev
# Setup pre-commit
pre-commit install
# Run tests
pytest
# Run all checks with pre-commit
pre-commit run --all-files
Future Ideas
- Consoldiate sync/async implementations
- Allow uncached "cursor" for reading a large file segement
- Cursors with separate caches (to allow clearing memory when done)
- would allow cursor-based access with non-async implementation
License
Apache License 2.0
What is hctef?
It's the HTTP Client That Eats Files, obviously.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hctef-0.1.1.tar.gz.
File metadata
- Download URL: hctef-0.1.1.tar.gz
- Upload date:
- Size: 77.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c720930343c031e50ff97280dcc2c9f8d22db8aa6b2b900784b6a6a29ed39032
|
|
| MD5 |
930bba03e231010710d507e7682f14af
|
|
| BLAKE2b-256 |
1d8c9d4871a2fb7715dd6081386c5ffd8d08c9fed26a5ff38bc00a74b171bd3e
|
Provenance
The following attestation bundles were made for hctef-0.1.1.tar.gz:
Publisher:
release.yml on jkeifer/hctef
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hctef-0.1.1.tar.gz -
Subject digest:
c720930343c031e50ff97280dcc2c9f8d22db8aa6b2b900784b6a6a29ed39032 - Sigstore transparency entry: 702681917
- Sigstore integration time:
-
Permalink:
jkeifer/hctef@ef46ef44484fde4621becd33f92b21a2d3a786db -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/jkeifer
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ef46ef44484fde4621becd33f92b21a2d3a786db -
Trigger Event:
release
-
Statement type:
File details
Details for the file hctef-0.1.1-py3-none-any.whl.
File metadata
- Download URL: hctef-0.1.1-py3-none-any.whl
- Upload date:
- Size: 15.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf833f23c596be844fc673dc414ce1b63e977bbb506ff757576b9746173b4e00
|
|
| MD5 |
9dd76bc9c834b7d83905053756027d28
|
|
| BLAKE2b-256 |
aead3666c2f966a429a9c1268967260ece0dd94bdcb31b29fa62fba479358f46
|
Provenance
The following attestation bundles were made for hctef-0.1.1-py3-none-any.whl:
Publisher:
release.yml on jkeifer/hctef
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hctef-0.1.1-py3-none-any.whl -
Subject digest:
cf833f23c596be844fc673dc414ce1b63e977bbb506ff757576b9746173b4e00 - Sigstore transparency entry: 702681947
- Sigstore integration time:
-
Permalink:
jkeifer/hctef@ef46ef44484fde4621becd33f92b21a2d3a786db -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/jkeifer
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ef46ef44484fde4621becd33f92b21a2d3a786db -
Trigger Event:
release
-
Statement type: