Skip to main content

Python library for acquiring, storing, transforming, and validating market data

Project description

Data Layer (qldata)

Code Style: Black Documentation Status

Data Layer (qldata) is a high-performance, production-grade Python library designed for acquiring, storing, transforming, and validating financial market data. It provides a unified interface for interacting with various data sources, including crypto exchanges (Binance, Bybit) and local storage (DuckDB, Parquet).

🚀 Key Features

  • Unified Data Interface: Seamlessly switch between live exchange feeds and historical data.
  • High Performance: Built on pandas, numpy, and pyarrow for efficient data manipulation.
  • Storage Optimized: Integrated with DuckDB for fast, analytical SQL queries on large datasets.
  • Exchange Support: Native adapters for Binance and Bybit with shared rate limiting and retry logic.
  • Automatic Chunking: Transparently handles multi-year historical data requests by auto-splitting into optimal chunks.
  • Smart Error Handling: Specific exception types (RateLimitError, NetworkError, ServerError) with automatic retry mechanisms.
  • Metadata Tracking: Automatic tracking of dataset freshness, coverage, and quality for smart caching.
  • Live Streaming: Robust WebSocket support with automatic reconnection and error handling.
  • Type Safe: Fully typed codebase using modern Python type hinting.
  • Production Ready: Comprehensive error handling, logging, and retry mechanisms via tenacity.

🛠️ Installation

Requires Python 3.10+.

pip install qldata

For a minimal install (core data structures only):

pip install qldata[minimal]

For development dependencies:

pip install qldata[dev]

⚡ Quick Start

Fetching Historical Data

The primary entry point for historical data is qd.data().

import qldata as qd

# Fetch last 30 hours of 1-hour klines for BTCUSDT from Binance
df = qd.data("BTCUSDT", source="binance").last(30).resolution("1h").get()
print(df.head())

# Fetch multi-year data - automatically chunked!
# This works seamlessly even for 2+ years of 1-minute data (>1M bars)
df_long = qd.data("BTCUSDT", source="binance").between("2023-01-01", "2025-01-01").resolution("1h").get()
print(f"Fetched {len(df_long)} bars")

Loading from Local Storage

Local stores use the source="local" alias and work with naive timestamps for convenience.

import qldata as qd
from qldata.config import get_config

# Point storage to a directory (Parquet by default)
qd.config(data_dir="./data", store_type="parquet")

# Load previously stored bars
local_df = qd.data("BTCUSDT", source="local").resolution("1h").last(48).get()
print(local_df.tail())

Working with Metadata

Check dataset information and freshness:

from qldata.stores.files import ParquetStore

store = ParquetStore("./data")

# List all tracked datasets
for meta in store.list_metadata():
    print(f"{meta.symbol} ({meta.timeframe}): {meta.record_count} bars")
    print(f"  Range: {meta.first_timestamp} to {meta.last_timestamp}")
    print(f"  Stale: {meta.is_stale(max_age_hours=24)}")

Handling Errors

from qldata.errors import RateLimitError, NetworkError

try:
    df = qd.data("BTCUSDT", source="binance").last(100).resolution("1m").get()
except RateLimitError:
    print("Rate limited - automatic retry will handle this")
except NetworkError as e:
    print(f"Network issue: {e}")

Streaming Live Data

For real-time data, use qd.stream().

import qldata as qd
import asyncio

async def handler(msg):
    print(msg)

# Stream live ticks
stream = qd.stream(["BTCUSDT"], source="binance").resolution("tick").on_data(handler).get()

# Note: In a real async application, you would await the stream session
# await stream.start()

🏗️ Architecture

qldata is built with a modular architecture:

  • Core Models: Fundamental data structures and types.
  • Adapters: Exchange-specific broker adapters (qldata/adapters/brokers/*.py) that share rate-limiters/clients.
  • Stores: Persistence layer for files/DBs with metadata sidecars and deduplication.
  • API/Queries: qd.data() / qd.stream() builders that route through adapters or local stores.
  • Resilience/Transforms: Retry, chunking, validation, and cleaning utilities used by adapters and queries.

🤝 Development

This is an internal project. Please follow the guidelines below for development.

  1. Environment: Ensure you are using the correct Python version (3.10+).
  2. Testing: Run pytest before pushing any changes.
  3. Linting: Use ruff and black to maintain code quality.
  4. Documentation: Update docstrings and mkdocs files when modifying APIs.

See the Developer Guide for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qldata-0.2.0.tar.gz (85.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qldata-0.2.0-py3-none-any.whl (110.4 kB view details)

Uploaded Python 3

File details

Details for the file qldata-0.2.0.tar.gz.

File metadata

  • Download URL: qldata-0.2.0.tar.gz
  • Upload date:
  • Size: 85.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for qldata-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f773bc343e78bb89224ef84d0542ba9b37f7bc95512d8851d439db9a7d409b01
MD5 9cabe5fb8fc5a2c16a30c05b871840f3
BLAKE2b-256 c6d93b9c6fa1581f35ab55bf9abd7abce601317dd37bf4adafe45b66ed0ef1d4

See more details on using hashes here.

File details

Details for the file qldata-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: qldata-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 110.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for qldata-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0d2172060701fe642f9faa922d5090299c6c2d613f06baa67c90826b10a2153d
MD5 a972f13fb7e2b7e464568e542089572e
BLAKE2b-256 717263e017a06ea4f04a94b6dd03b660b849a9ddd8dca94412ab015a7ff7125f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page