Skip to main content

Cryptocurrency OHLCV data collection with gap-free guarantee. Retrieves microstructure-enriched kline data from Binance Public Data Repository with automatic gap detection and filling.

Project description

Gapless Crypto Data

PyPI version Python Versions Downloads License: MIT

Cryptocurrency OHLCV data collection with gap-free guarantee. Retrieves microstructure-enriched kline data from Binance Public Data Repository with automatic gap detection and filling.

Installation

# UV (recommended)
uv add gapless-crypto-data

# pip
pip install gapless-crypto-data

Quick Start

import gapless_crypto_data as gcd

# Fetch historical data
df = gcd.download("BTCUSDT", timeframe="1h", start="2024-01-01", end="2024-06-30")

# Fetch recent data with limit
df = gcd.fetch_data("ETHUSDT", timeframe="4h", limit=1000)

# Get available symbols and timeframes
symbols = gcd.get_supported_symbols()
timeframes = gcd.get_supported_timeframes()

# Fill gaps in existing data directory
results = gcd.fill_gaps("./data")

Data Format

Returns pandas DataFrames with microstructure columns:

Column Type Description
date datetime64 Period open timestamp
open, high, low, close float64 OHLC prices
volume float64 Base asset volume
close_time datetime64 Period close timestamp
quote_asset_volume float64 Quote asset volume
number_of_trades int64 Trade count
taker_buy_base_asset_volume float64 Taker buy volume (base)
taker_buy_quote_asset_volume float64 Taker buy volume (quote)

See Data Format Specification for column semantics and constraints.

Supported Timeframes

All Binance spot kline intervals: 1s, 1m, 3m, 5m, 15m, 30m, 1h, 2h, 4h, 6h, 8h, 12h, 1d

API Reference

Function-based API

import gapless_crypto_data as gcd

# Primary collection function
df = gcd.download(symbol, timeframe, start, end)
df = gcd.fetch_data(symbol, timeframe, limit=None, start=None, end=None)

# Gap filling
results = gcd.fill_gaps(directory, symbols=None)

# Discovery
symbols = gcd.get_supported_symbols()
timeframes = gcd.get_supported_timeframes()

Class-based API

from gapless_crypto_data import BinancePublicDataCollector, UniversalGapFiller

# Data collection with full control
collector = BinancePublicDataCollector(
    symbol="BTCUSDT",
    start_date="2024-01-01",
    end_date="2024-12-31"
)
result = collector.collect_timeframe_data("1h")
df = result["dataframe"]

# Gap detection and filling
gap_filler = UniversalGapFiller()
gaps = gap_filler.detect_all_gaps(csv_file, timeframe)
result = gap_filler.process_file(csv_file, timeframe)

Full API documentation: Python API Reference

Data Sources

Source Method Use Case
Binance Public Data Repository Monthly/daily ZIP archives Historical bulk collection
Binance REST API Per-request klines Gap filling, recent data

Collection strategy: Repository archives for bulk historical data, API for gaps and recent periods. See Data Collection Guide.

AI Agent Integration

Programmatic discovery via __probe__ module:

import gapless_crypto_data
probe = gapless_crypto_data.__probe__

# API discovery
probe.discover_api()
probe.get_capabilities()
probe.get_task_graph()

See Probe Usage for AI agent integration patterns.

Development

Setup

git clone https://github.com/terrylica/gapless-crypto-data.git
cd gapless-crypto-data
uv venv && source .venv/bin/activate
uv sync --dev
uv run pre-commit install

Commands

Task Command
Run tests uv run pytest
Format uv run ruff format .
Lint uv run ruff check --fix .
Type check uv run mypy src/
Build uv build

Project Structure

src/gapless_crypto_data/
├── __init__.py          # Package exports
├── api.py               # Function-based API
├── __probe__.py         # AI agent discovery
├── collectors/          # Data collection
├── gap_filling/         # Gap detection/filling
└── validation/          # Data validation

Full development guide: Development Setup

Architecture

  • BinancePublicDataCollector: Bulk data retrieval from public repository
  • UniversalGapFiller: Gap detection and API-based filling
  • AtomicCSVOperations: Corruption-proof file operations
  • ValidationStorage: DuckDB-backed validation persistence

Architecture documentation: Overview

License

MIT License - see LICENSE

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gapless_crypto_data-4.0.2.tar.gz (6.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gapless_crypto_data-4.0.2-py3-none-any.whl (520.2 kB view details)

Uploaded Python 3

File details

Details for the file gapless_crypto_data-4.0.2.tar.gz.

File metadata

  • Download URL: gapless_crypto_data-4.0.2.tar.gz
  • Upload date:
  • Size: 6.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for gapless_crypto_data-4.0.2.tar.gz
Algorithm Hash digest
SHA256 f78c9476ca5936afdf7f67de4b37943e4d1b109491d6d8632bba6d5f1068124a
MD5 4cbc2643e200e83a2de87a7ea156c02e
BLAKE2b-256 94b01ba59f557f1cd03b5f21753652ba1d39f17b82ef644248aa3a408bb3eea6

See more details on using hashes here.

File details

Details for the file gapless_crypto_data-4.0.2-py3-none-any.whl.

File metadata

  • Download URL: gapless_crypto_data-4.0.2-py3-none-any.whl
  • Upload date:
  • Size: 520.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for gapless_crypto_data-4.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 147072c5c45fa4bfd2d7fa3ce73f590d1deaa6b2bffbef12b02e1cd1268f40bd
MD5 eb583945279e53f8f9f5759e7051399f
BLAKE2b-256 d665c3832d798d2743c3e23d39d56fe5484dcc113667b5d67fde6d4e20093c55

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page