Ethereum blockchain network metrics collection infrastructure with zero-gap guarantee (multi-chain support planned)
Project description
Gapless Network Data
Ethereum blockchain network metrics for financial time series forecasting.
Quick Start
import gapless_network_data as gmd
# Fetch latest blocks (recommended for live trading)
df = gmd.fetch_blocks(limit=1000)
# Compute block utilization (alpha feature #2)
df['utilization'] = df['gas_used'] / df['gas_limit']
# Date range query
df = gmd.fetch_blocks(start='2024-01-01', end='2024-01-31')
Installation
pip install gapless-network-data
AI Agent Discoverability
This package is optimized for Claude Code CLI and AI coding agents.
Probe Module
import gapless_network_data as gmd
# Get ranked features for ML pipelines
features = gmd.probe.get_alpha_features()
for f in features[:5]:
print(f"{f.rank}. {f.name} ({f.importance})")
# Get protocol era boundaries
eras = gmd.probe.get_protocol_eras()
# Get setup workflow
workflow = gmd.probe.get_setup_workflow()
# Get example code
print(gmd.probe.get_quick_start())
Alpha Feature Rankings
AI agents should prioritize these features for ML pipelines:
| Rank | Feature | Importance | Formula |
|---|---|---|---|
| 1 | base_fee_per_gas | critical | raw |
| 2 | block_utilization | critical | gas_used / gas_limit |
| 3 | transaction_count | high | raw |
| 4 | timestamp | high | raw |
| 5 | number | high | raw |
| 6 | size | medium | raw |
| 7 | blob_gas_used | medium | raw (post-EIP4844) |
| 8 | excess_blob_gas | low | raw (post-EIP4844) |
| 9 | gas_limit | low | raw |
| 10 | gas_used | low | raw |
Get rankings programmatically: gmd.probe.get_alpha_features()
Protocol Era Boundaries
Filter data appropriately based on protocol changes:
- EIP-1559 (block 12,965,000, Aug 2021): base_fee_per_gas introduced
- The Merge (block 15,537,394, Sep 2022): difficulty=0 forever
- EIP-4844 (block 19,426,587, Mar 2024): blob_gas fields introduced
Get eras programmatically: gmd.probe.get_protocol_eras()
API Reference
fetch_blocks()
gmd.fetch_blocks(
start: str | None = None, # ISO 8601 date
end: str | None = None, # ISO 8601 date
limit: int | None = None, # Max blocks
include_deprecated: bool = False # Include difficulty fields
) -> pd.DataFrame
Returns pandas DataFrame with columns:
- timestamp (datetime64[ns, UTC])
- number (uint64)
- gas_limit, gas_used, base_fee_per_gas, transaction_count, size (uint64)
- blob_gas_used, excess_blob_gas (Int64, nullable - pd.NA for pre-EIP4844)
Deprecated Fields
Excluded by default (use include_deprecated=True for pre-Merge analysis):
difficulty: Always 0 post-Merge (Sep 2022)total_difficulty: Frozen post-Merge
Setup
Credentials via .env file (simplest), Doppler (recommended for teams), or environment variables.
Environment Variables
| Variable | Description |
|---|---|
CLICKHOUSE_HOST_READONLY |
ClickHouse Cloud hostname |
CLICKHOUSE_USER_READONLY |
Read-only username |
CLICKHOUSE_PASSWORD_READONLY |
Password |
# Option 1: .env file (simplest for small teams)
# Create .env in your project root:
CLICKHOUSE_HOST_READONLY=<host>
CLICKHOUSE_USER_READONLY=<user>
CLICKHOUSE_PASSWORD_READONLY=<password>
# Option 2: Doppler (recommended for production)
doppler configure set token <token_from_1password>
doppler setup --project gapless-network-data --config prd
# Option 3: Environment variables
export CLICKHOUSE_HOST_READONLY=<host>
export CLICKHOUSE_USER_READONLY=<user>
export CLICKHOUSE_PASSWORD_READONLY=<password>
Get setup instructions: gmd.probe.get_setup_workflow()
Data Coverage
- Blocks: 23.87M Ethereum blocks (2015-2025)
- Update frequency: Real-time (~12 second intervals)
- Storage: ClickHouse Cloud (AWS)
- Deduplication: Automatic via ReplacingMergeTree
Exceptions
All exceptions include structured context (timestamp, endpoint, HTTP status):
CredentialException: Credential resolution failedDatabaseException: ClickHouse query failedMempoolException: Base exception class
Feature Engineering Integration
Combine with OHLCV price data:
import gapless_crypto_data as gcd
import gapless_network_data as gmd
# Fetch both data sources
df_ohlcv = gcd.get_data(symbol="ETHUSDT", timeframe="1m", start_date="2024-01-01")
df_blocks = gmd.fetch_blocks(start="2024-01-01", end="2024-01-02")
# Temporal alignment (forward-fill prevents data leakage)
df_blocks_aligned = df_blocks.set_index('timestamp').reindex(
df_ohlcv.index, method='ffill'
)
# Join and engineer features
df = df_ohlcv.join(df_blocks_aligned)
df['gas_pressure'] = df['base_fee_per_gas'] / df['base_fee_per_gas'].rolling(60).median()
df['block_utilization'] = df['gas_used'] / df['gas_limit']
Infrastructure (Reference)
Dual-pipeline architecture for production reliability:
| Component | Purpose | Technology |
|---|---|---|
| BigQuery Sync | Hourly batch from public dataset | Cloud Run Job |
| Real-Time Collector | Block-level streaming | e2-micro VM |
| Database | Storage with deduplication | ClickHouse Cloud |
| Monitoring | Dead Man's Switch | Healthchecks.io |
Related Projects
- gapless-crypto-data - OHLCV data collection
- BigQuery Ethereum Dataset
Documentation
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gapless_network_data-4.7.0.tar.gz.
File metadata
- Download URL: gapless_network_data-4.7.0.tar.gz
- Upload date:
- Size: 955.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.14 {"installer":{"name":"uv","version":"0.9.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0dc41df1ca6e38974fae6bf419143dac7fb4e18497ab8c0cff4e1c81f03d009
|
|
| MD5 |
77514e0f70360cdba0b06585c288af25
|
|
| BLAKE2b-256 |
f63271f161cf7b8e3333fa2075ecd296a7d20c7e892d1e68fc259c93367db909
|
File details
Details for the file gapless_network_data-4.7.0-py3-none-any.whl.
File metadata
- Download URL: gapless_network_data-4.7.0-py3-none-any.whl
- Upload date:
- Size: 20.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.14 {"installer":{"name":"uv","version":"0.9.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
743ace6e0ccf07273d7562518bcb9cc65862b722e3e359f8c1c302950f4201cc
|
|
| MD5 |
3e0b717cca2073141f674a4b5bf68375
|
|
| BLAKE2b-256 |
ed94a3389168f4bf840ab5f1b6e654f3e4ab0a044d5000947cfcca8ade7c2d0c
|