Track daily availability of Binance USDT perpetual futures from Vision repository
Project description
Binance Futures Availability Database
Track daily availability of ALL USDT perpetual futures from Binance Vision (2019-09-25 to present)
Symbol count is dynamic (~327 currently) - we discover and track all perpetual instruments available on each historical date.
Overview
Standalone DuckDB database tracking historical availability of Binance USDT-Margined (UM) perpetual futures contracts from Binance Vision S3 repository. Provides sub-second queries for "which symbols were available on date X?" with automated daily updates and volume metrics.
Key Features
- Complete Historical Data: 2019-09-25 (first UM-futures launch) to present (~2240 days)
- Dynamic Symbol Discovery: ✅ Auto-updated daily - Tracks all perpetual USDT contracts (~327 currently, varies by date)
- Fast Queries: <1ms snapshot queries, <10ms timelines
- Volume Metrics: Track file size growth and S3 freshness over time
- Small Footprint: 50-150MB database (compressed columnar)
- Automated Updates: ✅ GitHub Actions - daily 3AM UTC, zero infrastructure
- High Reliability: Strict error handling, comprehensive validation checks
Quick Start
Option 1: GitHub Actions (Recommended) ✅
Production-ready automated updates with zero infrastructure overhead.
1. Initial Database Creation
# Trigger historical backfill via GitHub Actions (one-time setup)
gh workflow run update-database.yml \
--field update_mode=backfill \
--field start_date=2019-09-25 \
--field end_date=$(date -d "yesterday" +%Y-%m-%d)
# Monitor progress (estimated 25-60 minutes)
gh run watch
# Verify database created
gh release view latest
2. Download Database
# Download from GitHub Releases
gh release download latest --pattern "availability.duckdb.gz"
gunzip availability.duckdb.gz
3. Automated Daily Updates
No action needed - workflow runs automatically daily at 3:00 AM UTC. Download latest database anytime from GitHub Releases.
Option 2: Local Development
# Install package
cd ~/eon/binance-futures-availability
uv pip install -e ".[dev]"
# Run local backfill
uv run python scripts/operations/backfill.py
Query Database
# CLI queries
uv run binance-futures-availability query snapshot 2024-01-15
uv run binance-futures-availability query timeline BTCUSDT
uv run binance-futures-availability query range 2024-01-01 2024-03-31
# Python API
python -c "
from binance_futures_availability.queries import AvailabilityQueries
q = AvailabilityQueries()
print(q.get_available_symbols_on_date('2024-01-15'))
"
Volume Rankings Archive (ADR-0013)
NEW: Daily volume rankings time-series in Parquet format, published alongside the database.
Overview
Single cumulative file containing historical daily rankings of all symbols by 24-hour trading volume (quote_volume_usdt), with rank change tracking across 1d, 7d, 14d, and 30d windows.
File: volume-rankings-timeseries.parquet (20 MB, ~733K rows)
Format: Parquet (columnar, SNAPPY compressed)
Grain: One row per (date, symbol) combination
Updates: Automated daily at 3:00 AM UTC (incremental append)
Quick Start
Option 1: Remote Query (Recommended) - Zero download, zero local storage:
# Install: pip install duckdb
import duckdb
url = "https://github.com/terryli/binance-futures-availability/releases/download/latest/volume-rankings-timeseries.parquet"
# Top 10 symbols (1-3 second query)
result = duckdb.execute(f"""
SELECT symbol, rank, quote_volume_usdt, rank_change_7d
FROM '{url}'
WHERE date = '2025-11-16' -- Replace with desired date
ORDER BY rank LIMIT 10
""").fetchdf()
print(result)
Option 2: Local Download - For offline use:
# Download from GitHub Releases
gh release download latest --pattern "volume-rankings-timeseries.parquet"
# Query with DuckDB
python -c "
import duckdb
result = duckdb.execute('''
SELECT symbol, rank, quote_volume_usdt, rank_change_7d
FROM \"volume-rankings-timeseries.parquet\"
WHERE date = (SELECT MAX(date) FROM \"volume-rankings-timeseries.parquet\")
ORDER BY rank LIMIT 10
''').fetchdf()
print(result)
"
Schema (13 Columns)
| Column | Type | Description |
|---|---|---|
| date | date32 | Trading date |
| symbol | string | Futures symbol |
| rank | uint16 | Volume rank (1=highest) |
| quote_volume_usdt | float64 | 24h volume (USDT) |
| trade_count | uint64 | Number of trades |
| rank_change_1d/7d/14d/30d | int16 | Rank delta (negative=improved) |
| percentile | float32 | Volume percentile (0-100) |
| market_share_pct | float32 | % of total market volume |
| days_available | uint8 | Days available in last 30d |
| generation_timestamp | timestamp[us] | File generation time |
Use Cases:
- Portfolio universe selection (top N by volume)
- Trend analysis (rank changes over time)
- Survivorship bias elimination
- Market share analysis
See: Using Volume Rankings Guide for query examples and advanced usage.
Architecture
Database Schema
Single table with volume metrics (ADR-0006):
daily_availability(date, symbol, available, file_size_bytes, last_modified, url, status_code, probe_timestamp)
Primary Key: (date, symbol) Indexes:
- idx_symbol_date (symbol, date) - fast timeline queries
- idx_available_date (available, date) - fast symbol listings
Volume Metrics:
file_size_bytes: ZIP file size from S3 (enables trend analysis)last_modified: S3 upload timestamp (enables freshness monitoring)
Storage: ~/.cache/binance-futures/availability.duckdb
Data Collection (Hybrid Strategy)
Binance Vision S3: https://data.binance.vision/data/futures/um/daily/klines/
Historical Backfill (Bulk Operations):
- Method: AWS CLI S3 listing (
aws s3 ls --no-sign-request) - Performance: 327 symbols × 4.5 sec = ~25 minutes
- Use case: One-time historical data collection
Daily Updates (Incremental Operations):
- Method: HTTP HEAD requests (parallel batch probing)
- Performance: ~327 symbols in ~1.5 seconds (150 parallel workers, empirically optimized)
- Use case: Automated daily updates via GitHub Actions (3 AM UTC)
- Benchmark: Worker Count Optimization
See: ADR-0005: AWS CLI for Bulk Operations
Error Handling
Policy: Strict raise-on-failure (ADR-0003)
- No retries (workflow retries next scheduled cycle)
- No fallbacks (no default values)
- No silent failures (all errors logged)
Documentation
Project Memory: CLAUDE.md - AI context and patterns SSoT Plan: docs/development/plan/v1.0.0-implementation-plan.yaml Schema: docs/schema/availability-database.schema.json MADRs: docs/architecture/decisions/
Guides:
Operations:
Development
Run Tests
# Unit tests only (fast, no network)
pytest -m "not integration"
# All tests including integration (slow, requires network)
pytest
# With coverage report
pytest --cov --cov-report=html
open htmlcov/index.html
Linting & Formatting
# Format code
ruff format src/ tests/
# Check linting
ruff check src/ tests/
# Fix auto-fixable issues
ruff check --fix src/ tests/
Architecture Decisions
All decisions documented as MADRs:
- ADR-0001: Daily table pattern (not range table)
- ADR-0002: DuckDB for storage
- ADR-0003: Strict error handling
- ADR-0004: APScheduler for automation (superseded by ADR-0009)
- ADR-0005: AWS CLI for bulk operations
- ADR-0006: Volume metrics collection
- ADR-0009: ✅ GitHub Actions automation (production)
- ADR-0010: ✅ Dynamic symbol discovery (daily S3 auto-update)
- ADR-0013: ✅ Volume rankings time-series archive (Parquet)
SLOs (Service Level Objectives)
Availability: 95% of daily updates complete successfully Correctness: >95% match with Binance exchangeInfo API Observability: All failures logged with full context Maintainability: 80%+ test coverage, all functions documented
Related Projects
- gapless-crypto-data: Spot OHLCV collection (similar ValidationStorage pattern)
- vision-futures-explorer: Initial futures discovery (source of probe functions)
License
MIT License
Contributing
This is a specialized internal tool. For major changes, please open an issue first to discuss what you would like to change.
Ensure tests pass and coverage remains ≥80%:
pytest --cov --cov-fail-under=80
Support
Documentation: See CLAUDE.md for complete project context Issues: File issues in project repository Questions: Consult docs/guides/ for common scenarios
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file binance_futures_availability-1.2.0.tar.gz.
File metadata
- Download URL: binance_futures_availability-1.2.0.tar.gz
- Upload date:
- Size: 529.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73d5aae44e135e7f89805fbc9d6688080b9b1616ca31de2e5b7845a926a19de8
|
|
| MD5 |
750b756c5a260e418792af500b738d26
|
|
| BLAKE2b-256 |
b23d72397f7a91f124e4124a9417b7e2bfff1404670946dfc41d67fdb2c4bfb1
|
File details
Details for the file binance_futures_availability-1.2.0-py3-none-any.whl.
File metadata
- Download URL: binance_futures_availability-1.2.0-py3-none-any.whl
- Upload date:
- Size: 42.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f6e0ac5b8e10da23f0b1add4aac3862dbd9f31550ec51b4b58cbdb6f40d7c16e
|
|
| MD5 |
b6c04dd0e01f4bcd641feadf401342c9
|
|
| BLAKE2b-256 |
aefb59eff2ea0bf338bdd89a40806d182c495980816c0c8d199e03f119976bfd
|