Skip to main content

Track daily availability of Binance USDT perpetual futures from Vision repository

Project description

Binance Futures Availability Database

Track daily availability of ALL USDT perpetual futures from Binance Vision (2019-09-25 to present)

Symbol count is dynamic (~327 currently) - we discover and track all perpetual instruments available on each historical date.

Python 3.12+ Coverage License

Overview

Standalone DuckDB database tracking historical availability of Binance USDT-Margined (UM) perpetual futures contracts from Binance Vision S3 repository. Provides sub-second queries for "which symbols were available on date X?" with automated daily updates and volume metrics.

Key Features

  • Complete Historical Data: 2019-09-25 (first UM-futures launch) to present (~2240 days)
  • Dynamic Symbol Discovery: ✅ Auto-updated daily - Tracks all perpetual USDT contracts (~327 currently, varies by date)
  • Fast Queries: <1ms snapshot queries, <10ms timelines
  • Volume Metrics: Track file size growth and S3 freshness over time
  • Small Footprint: 50-150MB database (compressed columnar)
  • Automated Updates: ✅ GitHub Actions - daily 3AM UTC, zero infrastructure
  • High Reliability: Strict error handling, comprehensive validation checks

Quick Start

Option 1: GitHub Actions (Recommended) ✅

Production-ready automated updates with zero infrastructure overhead.

1. Initial Database Creation

# Trigger historical backfill via GitHub Actions (one-time setup)
gh workflow run update-database.yml \
  --field update_mode=backfill \
  --field start_date=2019-09-25 \
  --field end_date=$(date -d "yesterday" +%Y-%m-%d)

# Monitor progress (estimated 25-60 minutes)
gh run watch

# Verify database created
gh release view latest

2. Download Database

# Download from GitHub Releases
gh release download latest --pattern "availability.duckdb.gz"
gunzip availability.duckdb.gz

3. Automated Daily Updates

No action needed - workflow runs automatically daily at 3:00 AM UTC. Download latest database anytime from GitHub Releases.

Option 2: Local Development

# Install package
cd ~/eon/binance-futures-availability
uv pip install -e ".[dev]"

# Run local backfill
uv run python scripts/operations/backfill.py

Query Database

# CLI queries
uv run binance-futures-availability query snapshot 2024-01-15
uv run binance-futures-availability query timeline BTCUSDT
uv run binance-futures-availability query range 2024-01-01 2024-03-31

# Python API
python -c "
from binance_futures_availability.queries import AvailabilityQueries
q = AvailabilityQueries()
print(q.get_available_symbols_on_date('2024-01-15'))
"

Volume Rankings Archive (ADR-0013)

NEW: Daily volume rankings time-series in Parquet format, published alongside the database.

Overview

Single cumulative file containing historical daily rankings of all symbols by 24-hour trading volume (quote_volume_usdt), with rank change tracking across 1d, 7d, 14d, and 30d windows.

File: volume-rankings-timeseries.parquet (20 MB, ~733K rows) Format: Parquet (columnar, SNAPPY compressed) Grain: One row per (date, symbol) combination Updates: Automated daily at 3:00 AM UTC (incremental append)

Quick Start

Option 1: Remote Query (Recommended) - Zero download, zero local storage:

# Install: pip install duckdb
import duckdb

url = "https://github.com/terryli/binance-futures-availability/releases/download/latest/volume-rankings-timeseries.parquet"

# Top 10 symbols (1-3 second query)
result = duckdb.execute(f"""
    SELECT symbol, rank, quote_volume_usdt, rank_change_7d
    FROM '{url}'
    WHERE date = '2025-11-16'  -- Replace with desired date
    ORDER BY rank LIMIT 10
""").fetchdf()

print(result)

Option 2: Local Download - For offline use:

# Download from GitHub Releases
gh release download latest --pattern "volume-rankings-timeseries.parquet"

# Query with DuckDB
python -c "
import duckdb
result = duckdb.execute('''
    SELECT symbol, rank, quote_volume_usdt, rank_change_7d
    FROM \"volume-rankings-timeseries.parquet\"
    WHERE date = (SELECT MAX(date) FROM \"volume-rankings-timeseries.parquet\")
    ORDER BY rank LIMIT 10
''').fetchdf()
print(result)
"

Schema (13 Columns)

Column Type Description
date date32 Trading date
symbol string Futures symbol
rank uint16 Volume rank (1=highest)
quote_volume_usdt float64 24h volume (USDT)
trade_count uint64 Number of trades
rank_change_1d/7d/14d/30d int16 Rank delta (negative=improved)
percentile float32 Volume percentile (0-100)
market_share_pct float32 % of total market volume
days_available uint8 Days available in last 30d
generation_timestamp timestamp[us] File generation time

Use Cases:

  • Portfolio universe selection (top N by volume)
  • Trend analysis (rank changes over time)
  • Survivorship bias elimination
  • Market share analysis

See: Using Volume Rankings Guide for query examples and advanced usage.

Architecture

Database Schema

Single table with volume metrics (ADR-0006):

daily_availability(date, symbol, available, file_size_bytes, last_modified, url, status_code, probe_timestamp)

Primary Key: (date, symbol) Indexes:

  • idx_symbol_date (symbol, date) - fast timeline queries
  • idx_available_date (available, date) - fast symbol listings

Volume Metrics:

  • file_size_bytes: ZIP file size from S3 (enables trend analysis)
  • last_modified: S3 upload timestamp (enables freshness monitoring)

Storage: ~/.cache/binance-futures/availability.duckdb

Data Collection (Hybrid Strategy)

Binance Vision S3: https://data.binance.vision/data/futures/um/daily/klines/

Historical Backfill (Bulk Operations):

  • Method: AWS CLI S3 listing (aws s3 ls --no-sign-request)
  • Performance: 327 symbols × 4.5 sec = ~25 minutes
  • Use case: One-time historical data collection

Daily Updates (Incremental Operations):

  • Method: HTTP HEAD requests (parallel batch probing)
  • Performance: ~327 symbols in ~1.5 seconds (150 parallel workers, empirically optimized)
  • Use case: Automated daily updates via GitHub Actions (3 AM UTC)
  • Benchmark: Worker Count Optimization

See: ADR-0005: AWS CLI for Bulk Operations

Error Handling

Policy: Strict raise-on-failure (ADR-0003)

  • No retries (workflow retries next scheduled cycle)
  • No fallbacks (no default values)
  • No silent failures (all errors logged)

Documentation

Project Memory: CLAUDE.md - AI context and patterns SSoT Plan: docs/development/plan/v1.0.0-implementation-plan.yaml Schema: docs/schema/availability-database.schema.json MADRs: docs/architecture/decisions/

Guides:

Operations:

Development

Run Tests

# Unit tests only (fast, no network)
pytest -m "not integration"

# All tests including integration (slow, requires network)
pytest

# With coverage report
pytest --cov --cov-report=html
open htmlcov/index.html

Linting & Formatting

# Format code
ruff format src/ tests/

# Check linting
ruff check src/ tests/

# Fix auto-fixable issues
ruff check --fix src/ tests/

Architecture Decisions

All decisions documented as MADRs:

  • ADR-0001: Daily table pattern (not range table)
  • ADR-0002: DuckDB for storage
  • ADR-0003: Strict error handling
  • ADR-0004: APScheduler for automation (superseded by ADR-0009)
  • ADR-0005: AWS CLI for bulk operations
  • ADR-0006: Volume metrics collection
  • ADR-0009: ✅ GitHub Actions automation (production)
  • ADR-0010: ✅ Dynamic symbol discovery (daily S3 auto-update)
  • ADR-0013: ✅ Volume rankings time-series archive (Parquet)

SLOs (Service Level Objectives)

Availability: 95% of daily updates complete successfully Correctness: >95% match with Binance exchangeInfo API Observability: All failures logged with full context Maintainability: 80%+ test coverage, all functions documented

Related Projects

License

MIT License

Contributing

This is a specialized internal tool. For major changes, please open an issue first to discuss what you would like to change.

Ensure tests pass and coverage remains ≥80%:

pytest --cov --cov-fail-under=80

Support

Documentation: See CLAUDE.md for complete project context Issues: File issues in project repository Questions: Consult docs/guides/ for common scenarios

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

binance_futures_availability-1.1.0.tar.gz (427.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

binance_futures_availability-1.1.0-py3-none-any.whl (41.5 kB view details)

Uploaded Python 3

File details

Details for the file binance_futures_availability-1.1.0.tar.gz.

File metadata

File hashes

Hashes for binance_futures_availability-1.1.0.tar.gz
Algorithm Hash digest
SHA256 aea77274998c60edd2f5f9e0245bf9321a8797827e0e709cd12694d3634ea0a3
MD5 9ce0ac0b981fee7015e04b8b07a460a8
BLAKE2b-256 cc57ebf6144893cb1c53445c357525527a95fc6ac1c0e6f590b8516961bf181e

See more details on using hashes here.

File details

Details for the file binance_futures_availability-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for binance_futures_availability-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 793579f7649ca6f77e5ce4bc9b92077387f79b02f3c9c82c79d43524a5c2a387
MD5 1d19f4243b1a6d5f053065a4f2e2b23f
BLAKE2b-256 118a36a7a067d8606064a135eb667acb4f6e625e7499c1e718eccc28b3a3b63a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page