A high-performance synchronization tool for downloading and managing Binance historical data (klines, trades, etc.) asynchronously. It supports both local and S3 storage with intelligent data optimization (monthly vs daily preference).
Project description
Binance Syncer
Binance Syncer is a high-performance synchronization tool for downloading and managing Binance historical data (klines, trades, etc.) asynchronously. It supports both local and S3 storage with intelligent data optimization (monthly vs daily preference).
Features
- Asynchronous synchronization: Concurrent downloading optimized for large volumes
- Multi-market support: SPOT, FUTURES (UM/CM), and OPTIONS markets
- Various data types: KLINES, TRADES, AGG_TRADES, BOOK_DEPTH, METRICS, and more
- Flexible storage: Local filesystem or cloud (AWS S3)
- Intelligent optimization: Automatic preference for monthly data over daily files
- CLI interface: Complete Click-based CLI with dry-run mode
- DuckDB-powered loader: Fast data loading with predicate pushdown
- Error handling: Automatic retry and robust SSL management
- Progress tracking: Rich progress bars with detailed statistics
- Advanced logging: Structured logs with automatic rotation
Installation
From pip
pip install binance-syncer
From uv
uv pip install binance-syncer
From source
git clone https://github.com/caymaar/binance-syncer.git
cd binance-syncer
pip install .
Configuration
Automatic configuration
On first launch, the syncer automatically creates:
~/utilities/config/binance_syncer.ini
~/utilities/logs/binance_syncer/
Manual configuration
Edit ~/utilities/config/binance_syncer.ini:
[LOCAL]
PATH = ~/binance-vision
[S3]
BUCKET = my-binance-data-bucket
PREFIX = binance-vision
[SETTINGS]
MAX_CONCURRENT_DOWNLOADS = 100
SYMBOL_CONCURRENCY = 10
BATCH_SIZE_SYNC = 20
BATCH_SIZE_DELETE = 1000
AWS environment variables (for S3)
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_DEFAULT_REGION=us-east-1
Usage
Command Line Interface
The CLI is built with Click and provides a user-friendly interface for data synchronization.
Basic commands
# Sync all SPOT KLINES 1d data
binance-syncer --market-type spot --data-type klines --interval 1d
# Specific symbols with progress bar
binance-syncer --market-type spot --data-type klines --interval 1d \
--symbols BTCUSDT --symbols ETHUSDT --symbols ADAUSDT --progress
# S3 storage
binance-syncer --market-type spot --data-type klines --interval 1d --s3 --progress
# Dry-run mode (shows what would be synced without downloading)
binance-syncer --market-type spot --data-type klines --interval 1d --dry-run
# Trade data (no interval required)
binance-syncer --market-type spot --data-type trades
Advanced options
# Futures with 4h interval
binance-syncer --market-type futures/um --data-type klines --interval 4h --progress
# Multiple symbols with S3 storage
binance-syncer --market-type spot --data-type klines --interval 1d \
--symbols BTCUSDT --symbols ETHUSDT --s3 --progress
# Disable progress bar
binance-syncer --market-type spot --data-type klines --interval 1d --no-progress
CLI Options
| Option | Type | Required | Description |
|---|---|---|---|
--market-type |
Choice | Yes | Market type: spot, futures/um, futures/cm, option |
--data-type |
Choice | Yes | Data type: klines, trades, aggTrades, etc. |
--interval |
Choice | Conditional | Kline interval (required for klines): 1s, 1m, 3m, 5m, 15m, 30m, 1h, 2h, 4h, 6h, 8h, 12h, 1d, 1w |
--symbols |
Multiple | No | Specific symbols to sync (can be repeated). If not provided, all symbols are synced |
--progress/--no-progress |
Flag | No | Show/hide progress bar (default: --progress) |
--dry-run |
Flag | No | Show what would be synced without downloading |
--s3 |
Flag | No | Use S3 storage instead of local |
Programmatic Usage
Syncer API
import asyncio
import binance_syncer as bs
async def main():
# Configure syncer
syncer = bs.Syncer(
market_type=bs.MarketType.SPOT,
data_type=bs.DataType.KLINES,
interval=bs.KlineInterval.D1,
progress=True,
s3=False # Set to True for S3 storage
)
# Sync specific symbols
await syncer.sync(["BTCUSDT", "ETHUSDT"])
# Or sync all available symbols
all_symbols = await syncer.list_remote_symbols()
print(f"Found {len(all_symbols)} symbols")
await syncer.sync(all_symbols)
# Execute
asyncio.run(main())
Loader API
The Loader uses DuckDB for fast parquet file loading with predicate pushdown:
import binance_syncer as bs
# Initialize loader
loader = bs.Loader(
market_type=bs.MarketType.SPOT,
data_type=bs.DataType.KLINES,
interval=bs.KlineInterval.D1,
s3=False # Set to True for S3
)
# Load data for a specific symbol and date range
df = loader.load(
symbol="BTCUSDT",
start="2023-01-01",
end="2023-06-01"
)
print(df)
Complete workflow example
import asyncio
import binance_syncer as bs
from utilities import LoggingConfigurator
async def sync_and_load():
# Configure logging
LoggingConfigurator.configure(project="my_project", level="INFO")
# 1. Sync data
syncer = bs.Syncer(
market_type=bs.MarketType.SPOT,
data_type=bs.DataType.KLINES,
interval=bs.KlineInterval.D1,
progress=True,
s3=False
)
await syncer.sync(["BTCUSDT"])
# 2. Load data
loader = bs.Loader(
market_type=bs.MarketType.SPOT,
data_type=bs.DataType.KLINES,
interval=bs.KlineInterval.D1
)
df = loader.load("BTCUSDT", start="2024-01-01", end="2024-12-31")
print(f"Loaded {len(df)} rows")
print(df.head())
asyncio.run(sync_and_load())
Batch processing multiple markets
import asyncio
import binance_syncer as bs
async def sync_all_markets():
"""Sync first symbol of every market type and data type."""
for market_type, data_types in bs.SCHEMA.items():
print(f"Market Type: {market_type}")
for data_type in data_types:
print(f" Data Type: {data_type}")
# Configure syncer
syncer = bs.Syncer(
market_type,
data_type,
bs.KlineInterval.D1 if data_type == bs.DataType.KLINES else None,
progress=True
)
# Get available symbols
symbols = await syncer.list_remote_symbols()
print(f" Available Symbols: {len(symbols)}")
# Sync first symbol only
if symbols:
symbol = symbols[0]
print(f" Syncing: {symbol}")
await syncer.sync([symbol])
asyncio.run(sync_all_markets()) # Warning, some market or type of data can be heavy
Data Structure
Local storage
~/binance-vision/
└── data/
├── spot/
│ ├── daily/
│ │ ├── klines/
│ │ │ └── BTCUSDT/
│ │ │ └── 1d/
│ │ │ └── 2024-02-15.parquet
│ │ └── trades/
│ │ └── BTCUSDT/
│ │ └── 2024-02-15.parquet
│ └── monthly/
│ └── klines/
│ └── BTCUSDT/
│ └── 1d/
│ └── 2024-01.parquet
├── futures/
│ └── um/
│ └── daily/
│ └── klines/
└── option/
└── daily/
└── BVOLIndex/
S3 storage
s3://your-bucket/
└── binance-vision/
└── data/
├── spot/
│ ├── daily/
│ │ └── klines/
│ │ └── BTCUSDT/
│ └── monthly/
│ └── klines/
│ └── BTCUSDT/
└── futures/
Supported Data Types
Market Type: SPOT
| Data Type | Schema Columns | Interval Required |
|---|---|---|
klines |
open_time, open, high, low, close, volume, close_time, quote_volume, count, taker_buy_volume, taker_buy_quote_volume, ignore | Yes |
trades |
id, price, qty, base_qty, time, is_buyer_maker, is_best_match | No |
aggTrades |
agg_trade_id, price, quantity, first_trade_id, last_trade_id, transact_time, is_buyer_maker, is_best_match | No |
Market Type: FUTURES_UM / FUTURES_CM
| Data Type | Schema Columns | Interval Required |
|---|---|---|
klines |
open_time, open, high, low, close, volume, close_time, quote_volume... | Yes |
trades |
id, price, qty, quote_qty/base_qty, time, is_buyer_maker | No |
aggTrades |
agg_trade_id, price, quantity, first_trade_id, last_trade_id, transact_time, is_buyer_maker | No |
bookDepth |
timestamp, percentage, depth, notional | No |
bookTicker |
update_id, best_bid_price, best_bid_qty, best_ask_price, best_ask_qty, transaction_time, event_time | No |
indexPriceKlines |
open_time, open, high, low, close, volume... | Yes |
markPriceKlines |
open_time, open, high, low, close, volume... | Yes |
premiumIndexKlines |
open_time, open, high, low, close, volume... | Yes |
liquidationSnapshot |
time, side, order_type, time_in_force, original_quantity, price, average_price, order_status... | No |
metrics |
create_time, symbol, sum_open_interest, sum_open_interest_value... | No |
Market Type: OPTION
| Data Type | Schema Columns | Interval Required |
|---|---|---|
BVOLIndex |
calc_time, symbol, base_asset, quote_asset, index_value | No |
EOHSummary |
date, hour, symbol, underlying, type, strike, open, high, low, close, volume_contracts, volume_usdt... | No |
Supported Intervals
| Category | Values |
|---|---|
| Seconds | 1s |
| Minutes | 1m, 3m, 5m, 15m, 30m |
| Hours | 1h, 2h, 4h, 6h, 8h, 12h |
| Days | 1d |
| Weeks | 1w |
Enums Reference
from binance_syncer import MarketType, DataType, KlineInterval
# Market Types
MarketType.SPOT # "spot"
MarketType.FUTURES_UM # "futures/um"
MarketType.FUTURES_CM # "futures/cm"
MarketType.OPTION # "option"
# Data Types
DataType.KLINES # "klines"
DataType.TRADES # "trades"
DataType.AGG_TRADES # "aggTrades"
DataType.BOOK_DEPTH # "bookDepth"
DataType.BOOK_TICKER # "bookTicker"
DataType.INDEX_PRICE_KLINES # "indexPriceKlines"
DataType.MARK_PRICE_KLINES # "markPriceKlines"
DataType.PREMIUM_INDEX_KLINES # "premiumIndexKlines"
DataType.LIQUIDATION_SNAPSHOT # "liquidationSnapshot"
DataType.METRICS # "metrics"
DataType.BVOL_INDEX # "BVOLIndex"
DataType.EOH_SUMMARY # "EOHSummary"
# Kline Intervals
KlineInterval.S1 # "1s"
KlineInterval.M1 # "1m"
KlineInterval.M3 # "3m"
KlineInterval.M5 # "5m"
KlineInterval.M15 # "15m"
KlineInterval.M30 # "30m"
KlineInterval.H1 # "1h"
KlineInterval.H2 # "2h"
KlineInterval.H4 # "4h"
KlineInterval.H6 # "6h"
KlineInterval.H8 # "8h"
KlineInterval.H12 # "12h"
KlineInterval.D1 # "1d"
KlineInterval.W1 # "1w"
Logging and Monitoring
Log configuration
Logs are managed by the utilities-toolkit package and automatically created:
~/utilities/logs/binance_syncer/
├── binance_syncer_2024-02-15.log
├── binance_syncer_2024-02-14.log
└── ...
Log levels
from utilities import LoggingConfigurator
# Configure logging for your project
LoggingConfigurator.configure(
project="binance_syncer",
level="INFO", # DEBUG, INFO, WARNING, ERROR
retention_days=7 # Automatic cleanup after 7 days
)
Automatic features
- Rotation: Daily log rotation
- Retention: Configurable retention period (default: 7 days)
- Format: Structured JSON-like format with timestamps
- Levels: DEBUG, INFO, WARNING, ERROR with color coding
Performance and Optimizations
Concurrency settings
Default settings (configurable in config.ini):
[SETTINGS]
MAX_CONCURRENT_DOWNLOADS = 100 # Concurrent downloads per symbol
SYMBOL_CONCURRENCY = 10 # Symbols processed in parallel
BATCH_SIZE_SYNC = 20 # Files processed per batch
BATCH_SIZE_DELETE = 1000 # Files deleted per batch
Network optimizations
- SSL: Robust configuration with certifi fallback
- Timeouts:
- Connection: 30s
- Read: 120s
- Total: 300s
- Retry logic: Automatic exponential backoff
- Chunked reading: 8KB chunks for large files
- Connection pooling: Reused HTTP connections via aiohttp
Data optimization
- Monthly preference: Automatically prefers monthly files over daily when available
- Deduplication: Skips already downloaded files
- Compression: Parquet format with Snappy compression
- Predicate pushdown: DuckDB pushdown for efficient date filtering
Storage optimization
The syncer intelligently manages data by:
- Checking for monthly files: Downloads monthly aggregated data when available
- Removing redundant daily files: Deletes daily files when monthly data covers the period
- Incremental updates: Only downloads missing or updated files
Example optimization:
Before: BTCUSDT-1d-2024-01-01.parquet (30 files for January)
After: BTCUSDT-1d-2024-01.parquet (1 monthly file)
Result: 29 files removed, faster loading
Troubleshooting
Common SSL issues
macOS Certificate Error
# Install Python certificates
/Applications/Python\ 3.11/Install\ Certificates.command
# Or via pip
pip install --upgrade certifi
# Verify SSL configuration
python -c "import ssl, certifi; print(certifi.where())"
SSL Diagnostic Mode
The CLI automatically performs SSL diagnostics on startup:
binance-syncer --market-type spot --data-type klines --interval 1d
# Output will show:
# === SSL Diagnostic ===
# Certifi path: /path/to/cacert.pem
# SSL default paths: ...
# ✅ SSL test connection successful
S3 Configuration Issues
Check AWS credentials
# List configuration
aws configure list
# Test S3 access
aws s3 ls s3://your-bucket/
# Check environment variables
echo $AWS_ACCESS_KEY_ID
echo $AWS_SECRET_ACCESS_KEY
echo $AWS_DEFAULT_REGION
Verify S3 bucket configuration
# Test with dry-run first
binance-syncer --market-type spot --data-type klines --interval 1d --s3 --dry-run
# Check bucket exists and is accessible
aws s3 ls s3://my-binance-data-bucket/binance-vision/
Connection Timeout Issues
If experiencing timeout issues:
# Increase timeout in your code
import aiohttp
timeout = aiohttp.ClientTimeout(
total=600, # 10 minutes total
connect=60, # 1 minute to connect
sock_read=300 # 5 minutes to read
)
Memory Issues
For large datasets:
# Configure DuckDB memory limit
loader = bs.Loader(
market_type=bs.MarketType.SPOT,
data_type=bs.DataType.KLINES,
interval=bs.KlineInterval.D1,
memory_limit="4GB", # Limit DuckDB memory usage
threads=4 # Limit thread count
)
Debugging Tips
Enable debug logging
from utilities import LoggingConfigurator
LoggingConfigurator.configure(
project="binance_syncer",
level="DEBUG" # Show all debug information
)
Check file integrity
import pandas as pd
# Verify parquet file
df = pd.read_parquet("path/to/file.parquet")
print(df.info())
print(df.head())
Test individual components
import asyncio
import binance_syncer as bs
async def test_connection():
syncer = bs.Syncer(
market_type=bs.MarketType.SPOT,
data_type=bs.DataType.KLINES,
interval=bs.KlineInterval.D1
)
# Test listing symbols
symbols = await syncer.list_remote_symbols()
print(f"Found {len(symbols)} symbols")
# Test dates computation for one symbol
if symbols:
dates = await syncer.compute_dates_cover(symbols[0])
print(f"Symbol: {symbols[0]}")
print(f"Months to download: {len(dates['M_DL'])}")
print(f"Days to download: {len(dates['D_DL'])}")
print(f"Days to remove: {len(dates['D_RM'])}")
asyncio.run(test_connection())
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Data provided by Binance Public Data
- Built with aiohttp, DuckDB, Click, and Rich
- Configuration management via utilities-toolkit
Support
- Issues: GitHub Issues
- Documentation: This README
- Examples: See examples/examples.ipynb
Changelog
v3.0.0
- Migrated CLI from argparse to Click
- Added
--s3flag for simplified S3 storage mode - Improved SSL diagnostics and error handling
- Enhanced DuckDB-based Loader with predicate pushdown
- Better progress tracking with Rich
- Updated dependencies and Python 3.9+ requirement
v2.x
- Initial public release
- Basic sync and load functionality
- S3 and local storage support
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file binance_syncer-4.0.1.tar.gz.
File metadata
- Download URL: binance_syncer-4.0.1.tar.gz
- Upload date:
- Size: 174.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b2e0615855034b6fa3c4f96f7ba669254d14ab425506cfd3542db6d310f0460
|
|
| MD5 |
761f48fad71716ea13e830430154bccb
|
|
| BLAKE2b-256 |
865c260bf9d03b60c3ce91f6a1ffc6c70d207af572cda636f3da34d546c985c0
|
File details
Details for the file binance_syncer-4.0.1-py3-none-any.whl.
File metadata
- Download URL: binance_syncer-4.0.1-py3-none-any.whl
- Upload date:
- Size: 45.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c048ffa3eaab4bea0ae61eb0b44664fe7faf78c420dac32dedbabbd6b6150d0c
|
|
| MD5 |
d70faa3c2ee762883a354a748f68ea44
|
|
| BLAKE2b-256 |
37f4d51b7cff11dcb1f2fc09bb55f64467b79a8ed39d08fc16f7a4cd2394447e
|