Skip to main content

Track daily availability of Binance USDT perpetual futures from Vision repository

Project description

Binance Futures Availability Database

Track daily availability of ALL USDT perpetual futures from Binance Vision (2019-09-25 to present)

Symbol count is dynamic (~327 currently) - we discover and track all perpetual instruments available on each historical date.

Python 3.12+ Coverage License

Overview

Standalone DuckDB database tracking historical availability of Binance USDT-Margined (UM) perpetual futures contracts from Binance Vision S3 repository. Provides sub-second queries for "which symbols were available on date X?" with automated daily updates and volume metrics.

Key Features

  • Complete Historical Data: 2019-09-25 (first UM-futures launch) to present (~2240 days)
  • Dynamic Symbol Discovery: ✅ Auto-updated daily - Tracks all perpetual USDT contracts (~327 currently, varies by date)
  • Fast Queries: <1ms snapshot queries, <10ms timelines
  • Volume Metrics: Track file size growth and S3 freshness over time
  • Small Footprint: 50-150MB database (compressed columnar)
  • Automated Updates: ✅ GitHub Actions - daily 3AM UTC, zero infrastructure
  • High Reliability: Strict error handling, comprehensive validation checks

Quick Start

Option 1: GitHub Actions (Recommended) ✅

Production-ready automated updates with zero infrastructure overhead.

1. Initial Database Creation

# Trigger historical backfill via GitHub Actions (one-time setup)
gh workflow run update-database.yml \
  --field update_mode=backfill \
  --field start_date=2019-09-25 \
  --field end_date=$(date -d "yesterday" +%Y-%m-%d)

# Monitor progress (estimated 25-60 minutes)
gh run watch

# Verify database created
gh release view latest

2. Download Database

# Download from GitHub Releases
gh release download latest --pattern "availability.duckdb.gz"
gunzip availability.duckdb.gz

3. Automated Daily Updates

No action needed - workflow runs automatically daily at 3:00 AM UTC. Download latest database anytime from GitHub Releases.

Option 2: Local Development

# Install package
cd ~/eon/binance-futures-availability
uv pip install -e ".[dev]"

# Run local backfill
uv run python scripts/operations/backfill.py

Query Database

# CLI queries
uv run binance-futures-availability query snapshot 2024-01-15
uv run binance-futures-availability query timeline BTCUSDT
uv run binance-futures-availability query range 2024-01-01 2024-03-31

# Python API
python -c "
from binance_futures_availability.queries import AvailabilityQueries
q = AvailabilityQueries()
print(q.get_available_symbols_on_date('2024-01-15'))
"

Volume Rankings Archive (ADR-0013)

NEW: Daily volume rankings time-series in Parquet format, published alongside the database.

Overview

Single cumulative file containing historical daily rankings of all symbols by 24-hour trading volume (quote_volume_usdt), with rank change tracking across 1d, 7d, 14d, and 30d windows.

File: volume-rankings-timeseries.parquet (20 MB, ~733K rows) Format: Parquet (columnar, SNAPPY compressed) Grain: One row per (date, symbol) combination Updates: Automated daily at 3:00 AM UTC (incremental append)

Quick Start

Option 1: Remote Query (Recommended) - Zero download, zero local storage:

# Install: pip install duckdb
import duckdb

url = "https://github.com/terryli/binance-futures-availability/releases/download/latest/volume-rankings-timeseries.parquet"

# Top 10 symbols (1-3 second query)
result = duckdb.execute(f"""
    SELECT symbol, rank, quote_volume_usdt, rank_change_7d
    FROM '{url}'
    WHERE date = '2025-11-16'  -- Replace with desired date
    ORDER BY rank LIMIT 10
""").fetchdf()

print(result)

Option 2: Local Download - For offline use:

# Download from GitHub Releases
gh release download latest --pattern "volume-rankings-timeseries.parquet"

# Query with DuckDB
python -c "
import duckdb
result = duckdb.execute('''
    SELECT symbol, rank, quote_volume_usdt, rank_change_7d
    FROM \"volume-rankings-timeseries.parquet\"
    WHERE date = (SELECT MAX(date) FROM \"volume-rankings-timeseries.parquet\")
    ORDER BY rank LIMIT 10
''').fetchdf()
print(result)
"

Schema (13 Columns)

Column Type Description
date date32 Trading date
symbol string Futures symbol
rank uint16 Volume rank (1=highest)
quote_volume_usdt float64 24h volume (USDT)
trade_count uint64 Number of trades
rank_change_1d/7d/14d/30d int16 Rank delta (negative=improved)
percentile float32 Volume percentile (0-100)
market_share_pct float32 % of total market volume
days_available uint8 Days available in last 30d
generation_timestamp timestamp[us] File generation time

Use Cases:

  • Portfolio universe selection (top N by volume)
  • Trend analysis (rank changes over time)
  • Survivorship bias elimination
  • Market share analysis

See: Using Volume Rankings Guide for query examples and advanced usage.

Architecture

Database Schema

Single table with volume metrics (ADR-0006):

daily_availability(date, symbol, available, file_size_bytes, last_modified, url, status_code, probe_timestamp)

Primary Key: (date, symbol) Indexes:

  • idx_symbol_date (symbol, date) - fast timeline queries
  • idx_available_date (available, date) - fast symbol listings

Volume Metrics:

  • file_size_bytes: ZIP file size from S3 (enables trend analysis)
  • last_modified: S3 upload timestamp (enables freshness monitoring)

Storage: ~/.cache/binance-futures/availability.duckdb

Data Collection (Hybrid Strategy)

Binance Vision S3: https://data.binance.vision/data/futures/um/daily/klines/

Historical Backfill (Bulk Operations):

  • Method: AWS CLI S3 listing (aws s3 ls --no-sign-request)
  • Performance: 327 symbols × 4.5 sec = ~25 minutes
  • Use case: One-time historical data collection

Daily Updates (Incremental Operations):

  • Method: HTTP HEAD requests (parallel batch probing)
  • Performance: ~327 symbols in ~1.5 seconds (150 parallel workers, empirically optimized)
  • Use case: Automated daily updates via GitHub Actions (3 AM UTC)
  • Benchmark: Worker Count Optimization

See: ADR-0005: AWS CLI for Bulk Operations

Error Handling

Policy: Strict raise-on-failure (ADR-0003)

  • No retries (workflow retries next scheduled cycle)
  • No fallbacks (no default values)
  • No silent failures (all errors logged)

Documentation

Project Memory: CLAUDE.md - AI context and patterns SSoT Plan: docs/development/plan/v1.0.0-implementation-plan.yaml Schema: docs/schema/availability-database.schema.json MADRs: docs/architecture/decisions/

Guides:

Operations:

Development

Run Tests

# Unit tests only (fast, no network)
pytest -m "not integration"

# All tests including integration (slow, requires network)
pytest

# With coverage report
pytest --cov --cov-report=html
open htmlcov/index.html

Linting & Formatting

# Format code
ruff format src/ tests/

# Check linting
ruff check src/ tests/

# Fix auto-fixable issues
ruff check --fix src/ tests/

Architecture Decisions

All decisions documented as MADRs:

  • ADR-0001: Daily table pattern (not range table)
  • ADR-0002: DuckDB for storage
  • ADR-0003: Strict error handling
  • ADR-0004: APScheduler for automation (superseded by ADR-0009)
  • ADR-0005: AWS CLI for bulk operations
  • ADR-0006: Volume metrics collection
  • ADR-0009: ✅ GitHub Actions automation (production)
  • ADR-0010: ✅ Dynamic symbol discovery (daily S3 auto-update)
  • ADR-0013: ✅ Volume rankings time-series archive (Parquet)

SLOs (Service Level Objectives)

Availability: 95% of daily updates complete successfully Correctness: >95% match with Binance exchangeInfo API Observability: All failures logged with full context Maintainability: 80%+ test coverage, all functions documented

Related Projects

License

MIT License

Contributing

This is a specialized internal tool. For major changes, please open an issue first to discuss what you would like to change.

Ensure tests pass and coverage remains ≥80%:

pytest --cov --cov-fail-under=80

Support

Documentation: See CLAUDE.md for complete project context Issues: File issues in project repository Questions: Consult docs/guides/ for common scenarios

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

binance_futures_availability-1.2.0.tar.gz (529.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

binance_futures_availability-1.2.0-py3-none-any.whl (42.8 kB view details)

Uploaded Python 3

File details

Details for the file binance_futures_availability-1.2.0.tar.gz.

File metadata

  • Download URL: binance_futures_availability-1.2.0.tar.gz
  • Upload date:
  • Size: 529.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for binance_futures_availability-1.2.0.tar.gz
Algorithm Hash digest
SHA256 73d5aae44e135e7f89805fbc9d6688080b9b1616ca31de2e5b7845a926a19de8
MD5 750b756c5a260e418792af500b738d26
BLAKE2b-256 b23d72397f7a91f124e4124a9417b7e2bfff1404670946dfc41d67fdb2c4bfb1

See more details on using hashes here.

File details

Details for the file binance_futures_availability-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: binance_futures_availability-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 42.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for binance_futures_availability-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f6e0ac5b8e10da23f0b1add4aac3862dbd9f31550ec51b4b58cbdb6f40d7c16e
MD5 b6c04dd0e01f4bcd641feadf401342c9
BLAKE2b-256 aefb59eff2ea0bf338bdd89a40806d182c495980816c0c8d199e03f119976bfd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page