Efficient bid-ask spread estimator from OHLC prices
Project description
QuantJourney Bid-Ask Spread Estimator
The quantjourney-bidask library provides an efficient estimator for calculating bid-ask spreads from open, high, low, and close (OHLC) prices, based on the methodology described in:
Ardia, D., Guidotti, E., Kroencke, T.A. (2024). Efficient Estimation of Bid-Ask Spreads from Open, High, Low, and Close Prices. Journal of Financial Economics, 161, 103916. doi:10.1016/j.jfineco.2024.103916
This library is designed for quantitative finance professionals, researchers, and traders who need accurate and computationally efficient spread estimates for equities, cryptocurrencies, and other assets.
๐ Part of the QuantJourney ecosystem - The framework with advanced quantitative finance tools and insights!
Features
- Efficient Spread Estimation: Implements the EDGE estimator for single, rolling, and expanding windows.
- Real-Time Data: Websocket support for live cryptocurrency data from Binance and other exchanges.
- Data Integration: Fetch OHLC data from Yahoo Finance and generate synthetic data for testing.
- Live Monitoring: Real-time spread monitoring with animated visualizations.
- Local Development: Works completely locally without cloud dependencies.
- Robust Handling: Supports missing values, non-positive prices, and various data frequencies.
- Comprehensive Tests: Extensive unit tests with known test cases from the original paper.
- Clear Documentation: Detailed docstrings and usage examples.
Examples and Visualizations
The package includes comprehensive examples with beautiful visualizations:
Spread Monitor Results
Basic Data Analysis
Crypto Spread Comparison
FAQ
What exactly does the estimator compute?
The estimator returns the root mean square effective spread over the sample period. This quantifies the average transaction cost implied by bid-ask spreads, based on open, high, low, and close (OHLC) prices.
What is unique about this implementation?
This package provides a highly optimized and robust implementation of the EDGE estimator. Beyond a direct translation of the paper's formula, it features:
- A Hybrid, High-Performance Engine: The core logic leverages fast, vectorized NumPy operations for data preparation and calls a specialized, JIT-compiled kernel via Numba for the computationally intensive GMM calculations.
- HFT-Ready Version (edge_hft.py): An included, hyper-optimized function that uses fastmath compilation for the absolute lowest latency, designed for production HFT pipelines where every microsecond matters.
- Robust Data Handling: Gracefully manages missing values (NaN) and non-positive prices to prevent crashes.
- Advanced Windowing Functions: Efficient and correct edge_rolling and edge_expanding functions that are fully compatible with the powerful features of pandas, including custom step sizes.
What's the difference between the edge functions?
The library provides a tiered set of functions for different needs:
- edge(): The core function. It's fast, robust, and computes a single spread estimate for a given sample of data. This is the building block for all other functions.
- edge_hft(): A specialized version of edge() for HFT users. It's the fastest possible implementation but requires perfectly clean input data (no NaNs) to achieve its speed.
- edge_rolling(): Computes the spread on a rolling window over a time series. It's perfect for seeing how the spread evolves over time. It is highly optimized and accepts all arguments from pandas.DataFrame.rolling() (like window and step).
- edge_expanding(): Computes the spread on an expanding (cumulative) window. This is useful for analyzing how the spread estimate converges or changes as more data becomes available.
What is the minimum number of observations?
At least 3 valid observations are required.
How should I choose the window size or frequency?
Short windows (e.g. a few days) reflect local spread conditions but may be noisy. Longer windows (e.g. 1 year) reduce variance but smooth over changes. For intraday use, minute-level frequency is recommended if the asset trades frequently.
Rule of thumb: ensure on average โฅ2 trades per interval.
Can I use intraday or tick data?
Yes โ the estimator supports intraday OHLC data directly. For tick data, resample into OHLC format first (e.g., using pandas.resample).
What if I get NaN results?
The estimator may return NaN if:
- Input prices are inconsistent (e.g. high < low)
- There are too many missing or invalid values
- Probability thresholds are not met (e.g. insufficient variance in prices)
- Spread variance is non-positive
In these cases, re-examine your input or adjust the sampling frequency.
Installation
Install the library via pip:
pip install quantjourney-bidask
For development (local setup):
git clone https://github.com/QuantJourneyOrg/quantjourney-bidask
cd quantjourney-bidask
pip install -e .
Quick Start
Basic Usage
from quantjourney_bidask import edge
# Example OHLC data (as lists or numpy arrays)
open_prices = [100.0, 101.5, 99.8, 102.1, 100.9]
high_prices = [102.3, 103.0, 101.2, 103.5, 102.0]
low_prices = [99.5, 100.8, 98.9, 101.0, 100.1]
close_prices = [101.2, 100.2, 101.8, 100.5, 101.5]
# Calculate bid-ask spread
spread = edge(open_prices, high_prices, low_prices, close_prices)
print(f"Estimated bid-ask spread: {spread:.6f}")
Rolling Window Analysis
from quantjourney_bidask import edge_rolling
import pandas as pd
# Create DataFrame with OHLC data
df = pd.DataFrame({
'open': open_prices,
'high': high_prices,
'low': low_prices,
'close': close_prices
})
# Calculate rolling spreads with a 20-period window
rolling_spreads = edge_rolling(df, window=20)
print(f"Rolling spreads: {rolling_spreads}")
Data Fetching Integration
from data.fetch import get_stock_data, get_crypto_data
from quantjourney_bidask import edge_rolling
import asyncio
# Fetch stock data
stock_df = get_stock_data("PL", period="1mo", interval="1d")
stock_spreads = edge_rolling(stock_df, window=20)
print(f"PL average spread: {stock_spreads.mean():.6f}")
# Fetch crypto data (async)
async def get_crypto_spreads():
crypto_df = await get_crypto_data("BTC/USDT", "binance", "1h", 168)
crypto_spreads = edge_rolling(crypto_df, window=24)
return crypto_spreads.mean()
crypto_avg_spread = asyncio.run(get_crypto_spreads())
print(f"BTC average spread: {crypto_avg_spread:.6f}")
Real-time Data Streaming
from data.fetch import DataFetcher
import asyncio
async def stream_btc_spreads():
fetcher = DataFetcher()
# Stream BTC data for 60 seconds
btc_stream = await fetcher.get_btc_1m_websocket(duration_seconds=60)
# Calculate spread from real-time data
if not btc_stream.empty:
avg_spread_pct = (btc_stream['spread'] / btc_stream['price']).mean() * 100
print(f"Real-time BTC average spread: {avg_spread_pct:.4f}%")
asyncio.run(stream_btc_spreads())
Real-Time Spread Monitoring
from data.fetch import create_spread_monitor
# Create real-time spread monitor
monitor = create_spread_monitor(["BTCUSDT", "ETHUSDT"], window=20)
# Add callback for spread updates
def print_spread_update(spread_data):
print(f"{spread_data['symbol']}: {spread_data['spread_bps']:.2f} bps")
monitor.add_spread_callback(print_spread_update)
# Start monitoring (uses websockets for live data)
monitor.start_monitoring("1m")
Animated Real-Time Dashboard
# Run the real-time dashboard
python examples/websocket_realtime_demo.py --mode dashboard
# Or console mode
python examples/websocket_realtime_demo.py --mode console
# Quick 30-second BTC websocket demo
python examples/animated_spread_monitor.py
Project Structure
quantjourney_bidask/
โโโ quantjourney_bidask/ # Main library code
โ โโโ __init__.py
โ โโโ edge.py # Core EDGE estimator
โ โโโ edge_hft.py # EDGE estimator optimised HFT-version
โ โโโ edge_rolling.py # Rolling window estimation
โ โโโ edge_expanding.py # Expanding window estimation
โโโ data/
โ โโโ fetch.py # Simplified data fetcher for examples
โโโ examples/ # Comprehensive usage examples
โ โโโ simple_data_example.py # Basic usage demonstration
โ โโโ basic_spread_estimation.py # Core spread estimation examples
โ โโโ animated_spread_monitor.py # Animated visualizations
โ โโโ crypto_spread_comparison.py # Crypto spread analysis
โ โโโ liquidity_risk_monitor.py # Risk monitoring
โ โโโ websocket_realtime_demo.py # Live websocket monitoring demo
โ โโโ threshold_alert_monitor.py # Threshold-based spread alerts
โโโ tests/ # Unit tests (GitHub only)
โ โโโ test_edge.py
โ โโโ test_edge_rolling.py
โ โโโ test_edge_expanding.py
โ โโโ test_data_fetcher.py
โ โโโ test_estimators.py
โโโ _output/ # Example output images
โโโ simple_data_example.png
โโโ crypto_spread_comparison.png
โโโ spread_estimator_results.png
Running Examples
After installing via pip, examples are included in the package:
import quantjourney_bidask
from pathlib import Path
# Find package location
pkg_path = Path(quantjourney_bidask.__file__).parent
examples_path = pkg_path.parent / 'examples'
print(f"Examples located at: {examples_path}")
# List available examples
for example in examples_path.glob('*.py'):
print(f"๐ {example.name}")
Or clone the repository for full access to examples and tests:
git clone https://github.com/QuantJourneyOrg/quantjourney-bidask
cd quantjourney-bidask
python examples/simple_data_example.py
python examples/basic_spread_estimation.py
python examples/animated_spread_monitor.py # 30s real BTC websocket demo
python examples/crypto_spread_comparison.py
Available Examples
simple_data_example.py- Basic usage with stock and crypto databasic_spread_estimation.py- Core spread estimation functionalityanimated_spread_monitor.py- Real-time animated visualizations with 30s websocket democrypto_spread_comparison.py- Multi-asset crypto analysis and comparisonliquidity_risk_monitor.py- Risk monitoring and alertswebsocket_realtime_demo.py- Live websocket monitoring dashboardthreshold_alert_monitor.py- Threshold-based spread alerts and monitoring
Testing and Development
Unit Tests
The package includes comprehensive unit tests (available in the GitHub repository):
test_edge.py- Core EDGE estimator tests with known values from the academic papertest_edge_rolling.py- Rolling window estimation teststest_edge_expanding.py- Expanding window estimation teststest_data_fetcher.py- Data fetching functionality teststest_estimators.py- Integration tests for all estimators
Tests verify accuracy against the original paper's test cases and handle edge cases like missing data, non-positive prices, and various market conditions.
Development and Testing
For full development access including tests:
# Clone the repository
git clone https://github.com/QuantJourneyOrg/quantjourney-bidask
cd quantjourney-bidask
# Install in development mode
pip install -e .
# Run tests
python -m pytest tests/ -v
# Run specific test files
python -m pytest tests/test_edge.py -v
python -m pytest tests/test_data_fetcher.py -v
# Run examples
python examples/simple_data_example.py
python examples/basic_spread_estimation.py
python examples/animated_spread_monitor.py # Real BTC websocket demo
Package vs Repository
- PyPI Package (
pip install quantjourney-bidask): Includes core library, examples, and documentation - GitHub Repository: Full development environment with tests, development tools, and additional documentation
API Reference
Core Functions
edge(open, high, low, close, sign=False): Single-period spread estimationedge_rolling(df, window, min_periods=None): Rolling window estimationedge_expanding(df, min_periods=3): Expanding window estimation
Data Fetching (data/fetch.py) - Examples & Demos
DataFetcher(): Simplified data fetcher class for examplesget_stock_data(ticker, period, interval): Fetch stock data from Yahoo Financeget_crypto_data(symbol, exchange, timeframe, limit): Fetch crypto data via CCXT (async)stream_btc_data(duration_seconds): Stream BTC data via websocket (async)DataFetcher.get_btc_1m_websocket(): Stream BTC 1-minute dataDataFetcher.get_historical_crypto_data(): Get historical crypto OHLCV dataDataFetcher.save_data()/DataFetcher.load_data(): Save/load data to CSV
Real-Time Classes
RealTimeDataStream: Websocket data streaming for live market dataRealTimeSpreadMonitor: Real-time spread calculation and monitoringAnimatedSpreadMonitor: Animated real-time visualization
Requirements
- Python >= 3.11
- numpy >= 1.20
- pandas >= 1.5
- requests >= 2.28
- yfinance >= 0.2
- matplotlib >= 3.5
- websocket-client >= 1.0
WebSocket Support
The library supports real-time data via websockets:
- Binance:
wss://stream.binance.com:9443/ws/(cryptocurrency data) - Fallback: Synthetic data generation for testing when websockets unavailable
Real-time features:
- Live spread calculation
- Animated visualizations
- Threshold alerts
- Multi-symbol monitoring
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Development Setup
git clone https://github.com/QuantJourneyOrg/quantjourney-bidask
cd quantjourney-bidask
pip install -e ".[dev]"
# Run tests
pytest
# Run examples
python examples/animated_spread_monitor.py # 30s real BTC websocket demo
python examples/websocket_realtime_demo.py # Full dashboard
Support
- Documentation: GitHub Repository
- Issues: Bug Tracker
- Contact: jakub@quantjourney.pro
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file quantjourney_bidask-1.0.2.tar.gz.
File metadata
- Download URL: quantjourney_bidask-1.0.2.tar.gz
- Upload date:
- Size: 50.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7391746afa1f8ea4d2698ae08cc3ebd6f9b915be762bee4cb66cddd2651a7b9b
|
|
| MD5 |
d58d88a77e3cf0d1664fd99ec2c44b38
|
|
| BLAKE2b-256 |
8ca1edeb65df88d9651f7c0bd5695757a0d1145bff3e856862d1c536c90b33f0
|
File details
Details for the file quantjourney_bidask-1.0.2-py3-none-any.whl.
File metadata
- Download URL: quantjourney_bidask-1.0.2-py3-none-any.whl
- Upload date:
- Size: 18.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4788bb236859beae6f7dfdc342eed3a2a8da8a52e62e6c369ef8e51da59eae68
|
|
| MD5 |
49c38dbebf9d4349032625783bfa8e20
|
|
| BLAKE2b-256 |
f3ea37643716f9d51f0d2381f28bc74b98bcea565df55993b73a092dd316e786
|