Skip to main content

Cryptocurrency OHLCV data collection with gap-free guarantee. Retrieves microstructure-enriched kline data from Binance Public Data Repository with automatic gap detection and filling.

Project description

Gapless Crypto Data

PyPI version Python Versions Downloads License: MIT

Cryptocurrency OHLCV data collection with gap-free guarantee. Retrieves microstructure-enriched kline data from Binance Public Data Repository with automatic gap detection and filling.

Installation

# UV (recommended)
uv add gapless-crypto-data

# pip
pip install gapless-crypto-data

Quick Start

import gapless_crypto_data as gcd

# Fetch historical data
df = gcd.download("BTCUSDT", timeframe="1h", start="2024-01-01", end="2024-06-30")

# Fetch recent data with limit
df = gcd.fetch_data("ETHUSDT", timeframe="4h", limit=1000)

# Get available symbols and timeframes
symbols = gcd.get_supported_symbols()
timeframes = gcd.get_supported_timeframes()

# Fill gaps in existing data directory
results = gcd.fill_gaps("./data")

Data Format

Returns pandas DataFrames with microstructure columns:

Column Type Description
date datetime64 Period open timestamp
open, high, low, close float64 OHLC prices
volume float64 Base asset volume
close_time datetime64 Period close timestamp
quote_asset_volume float64 Quote asset volume
number_of_trades int64 Trade count
taker_buy_base_asset_volume float64 Taker buy volume (base)
taker_buy_quote_asset_volume float64 Taker buy volume (quote)

See Data Format Specification for column semantics and constraints.

Supported Timeframes

All Binance spot kline intervals. Query dynamically:

import gapless_crypto_data as gcd
print(gcd.get_supported_timeframes())

API Reference

Function-based API

import gapless_crypto_data as gcd

# Primary collection function
df = gcd.download(symbol, timeframe, start, end)
df = gcd.fetch_data(symbol, timeframe, limit=None, start=None, end=None)

# Gap filling
results = gcd.fill_gaps(directory, symbols=None)

# Discovery
symbols = gcd.get_supported_symbols()
timeframes = gcd.get_supported_timeframes()

Class-based API

from gapless_crypto_data import BinancePublicDataCollector, UniversalGapFiller

# Data collection with full control
collector = BinancePublicDataCollector(
    symbol="BTCUSDT",
    start_date="2024-01-01",
    end_date="2024-12-31"
)
result = collector.collect_timeframe_data("1h")
df = result["dataframe"]

# Gap detection and filling
gap_filler = UniversalGapFiller()
gaps = gap_filler.detect_all_gaps(csv_file, timeframe)
result = gap_filler.process_file(csv_file, timeframe)

Full API documentation: Python API Reference

Data Sources

Source Method Use Case
Binance Public Data Repository Monthly/daily ZIP archives Historical bulk collection
Binance REST API Per-request klines Gap filling, recent data

Collection strategy: Repository archives for bulk historical data, API for gaps and recent periods. See Data Collection Guide.

AI Agent Integration

Programmatic discovery via __probe__ module:

import gapless_crypto_data
probe = gapless_crypto_data.__probe__

# API discovery
probe.discover_api()
probe.get_capabilities()
probe.get_task_graph()

See Probe Usage for AI agent integration patterns.

Development

Setup

git clone https://github.com/terrylica/gapless-crypto-data.git
cd gapless-crypto-data
uv venv && source .venv/bin/activate
uv sync --dev
uv run pre-commit install

Commands

Task Command
Run tests uv run pytest
Format uv run ruff format .
Lint uv run ruff check --fix .
Type check uv run mypy src/
Build uv build

Project Structure

src/gapless_crypto_data/
├── __init__.py          # Package exports
├── api.py               # Function-based API
├── __probe__.py         # AI agent discovery
├── collectors/          # Data collection
├── gap_filling/         # Gap detection/filling
└── validation/          # Data validation

Full development guide: Development Setup

Architecture

  • BinancePublicDataCollector: Bulk data retrieval from public repository
  • UniversalGapFiller: Gap detection and API-based filling
  • AtomicCSVOperations: Corruption-proof file operations
  • ValidationStorage: DuckDB-backed validation persistence

Architecture documentation: Overview

License

MIT License - see LICENSE

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gapless_crypto_data-4.0.3.tar.gz (6.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gapless_crypto_data-4.0.3-py3-none-any.whl (519.3 kB view details)

Uploaded Python 3

File details

Details for the file gapless_crypto_data-4.0.3.tar.gz.

File metadata

  • Download URL: gapless_crypto_data-4.0.3.tar.gz
  • Upload date:
  • Size: 6.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for gapless_crypto_data-4.0.3.tar.gz
Algorithm Hash digest
SHA256 80dc01fc7fa4dc018e604f9b1696297589836ce6df77a58924154ee26f623c5c
MD5 2ae20a4c94ab66ec8bca6bf3e6cb8a8f
BLAKE2b-256 2bc084aa2f9ca879eed0798b3dad3a1d03ec736360f5bc86b1f43161a84e5ce1

See more details on using hashes here.

File details

Details for the file gapless_crypto_data-4.0.3-py3-none-any.whl.

File metadata

  • Download URL: gapless_crypto_data-4.0.3-py3-none-any.whl
  • Upload date:
  • Size: 519.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for gapless_crypto_data-4.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 2838a3e122c139c7bcf49f7ff79ffb1e04701744fbbc475237b127fd9e013dd3
MD5 6db91dac86c2b6bc12703d264f12623d
BLAKE2b-256 ca5db0d9f5096245050a289363feb7d332786d82757d278099f89521bb4864be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page