Skip to main content

Ethereum blockchain network metrics collection infrastructure with zero-gap guarantee (multi-chain support planned)

Project description

Gapless Network Data

Ethereum blockchain network metrics for financial time series forecasting.

Quick Start

import gapless_network_data as gmd

# Fetch latest blocks (recommended for live trading)
df = gmd.fetch_blocks(limit=1000)

# Compute block utilization (alpha feature #2)
df['utilization'] = df['gas_used'] / df['gas_limit']

# Date range query
df = gmd.fetch_blocks(start='2024-01-01', end='2024-01-31')

Installation

pip install gapless-network-data

AI Agent Discoverability

This package is optimized for Claude Code CLI and AI coding agents.

Probe Module

import gapless_network_data as gmd

# Get ranked features for ML pipelines
features = gmd.probe.get_alpha_features()
for f in features[:5]:
    print(f"{f.rank}. {f.name} ({f.importance})")

# Get protocol era boundaries
eras = gmd.probe.get_protocol_eras()

# Get setup workflow
workflow = gmd.probe.get_setup_workflow()

# Get example code
print(gmd.probe.get_quick_start())

Alpha Feature Rankings

AI agents should prioritize these features for ML pipelines:

Rank Feature Importance Formula
1 base_fee_per_gas critical raw
2 block_utilization critical gas_used / gas_limit
3 transaction_count high raw
4 timestamp high raw
5 number high raw
6 size medium raw
7 blob_gas_used medium raw (post-EIP4844)
8 excess_blob_gas low raw (post-EIP4844)
9 gas_limit low raw
10 gas_used low raw

Get rankings programmatically: gmd.probe.get_alpha_features()

Protocol Era Boundaries

Filter data appropriately based on protocol changes:

  • EIP-1559 (block 12,965,000, Aug 2021): base_fee_per_gas introduced
  • The Merge (block 15,537,394, Sep 2022): difficulty=0 forever
  • EIP-4844 (block 19,426,587, Mar 2024): blob_gas fields introduced

Get eras programmatically: gmd.probe.get_protocol_eras()

API Reference

fetch_blocks()

gmd.fetch_blocks(
    start: str | None = None,     # ISO 8601 date
    end: str | None = None,       # ISO 8601 date
    limit: int | None = None,     # Max blocks
    include_deprecated: bool = False  # Include difficulty fields
) -> pd.DataFrame

Returns pandas DataFrame with columns:

  • timestamp (datetime64[ns, UTC])
  • number (uint64)
  • gas_limit, gas_used, base_fee_per_gas, transaction_count, size (uint64)
  • blob_gas_used, excess_blob_gas (Int64, nullable - pd.NA for pre-EIP4844)

Deprecated Fields

Excluded by default (use include_deprecated=True for pre-Merge analysis):

  • difficulty: Always 0 post-Merge (Sep 2022)
  • total_difficulty: Frozen post-Merge

Setup

Credentials via .env file (simplest), Doppler (recommended for teams), or environment variables.

Environment Variables

Variable Description
CLICKHOUSE_HOST_READONLY ClickHouse Cloud hostname
CLICKHOUSE_USER_READONLY Read-only username
CLICKHOUSE_PASSWORD_READONLY Password
# Option 1: .env file (simplest for small teams)
# Create .env in your project root:
CLICKHOUSE_HOST_READONLY=<host>
CLICKHOUSE_USER_READONLY=<user>
CLICKHOUSE_PASSWORD_READONLY=<password>

# Option 2: Doppler (recommended for production)
doppler configure set token <token_from_1password>
doppler setup --project gapless-network-data --config prd

# Option 3: Environment variables
export CLICKHOUSE_HOST_READONLY=<host>
export CLICKHOUSE_USER_READONLY=<user>
export CLICKHOUSE_PASSWORD_READONLY=<password>

Get setup instructions: gmd.probe.get_setup_workflow()

Data Coverage

  • Blocks: 23.87M Ethereum blocks (2015-2025)
  • Update frequency: Real-time (~12 second intervals)
  • Storage: ClickHouse Cloud (AWS)
  • Deduplication: Automatic via ReplacingMergeTree

Exceptions

All exceptions include structured context (timestamp, endpoint, HTTP status):

  • CredentialException: Credential resolution failed
  • DatabaseException: ClickHouse query failed
  • MempoolException: Base exception class

Feature Engineering Integration

Combine with OHLCV price data:

import gapless_crypto_data as gcd
import gapless_network_data as gmd

# Fetch both data sources
df_ohlcv = gcd.get_data(symbol="ETHUSDT", timeframe="1m", start_date="2024-01-01")
df_blocks = gmd.fetch_blocks(start="2024-01-01", end="2024-01-02")

# Temporal alignment (forward-fill prevents data leakage)
df_blocks_aligned = df_blocks.set_index('timestamp').reindex(
    df_ohlcv.index, method='ffill'
)

# Join and engineer features
df = df_ohlcv.join(df_blocks_aligned)
df['gas_pressure'] = df['base_fee_per_gas'] / df['base_fee_per_gas'].rolling(60).median()
df['block_utilization'] = df['gas_used'] / df['gas_limit']

Infrastructure (Reference)

Dual-pipeline architecture for production reliability:

Component Purpose Technology
BigQuery Sync Hourly batch from public dataset Cloud Run Job
Real-Time Collector Block-level streaming e2-micro VM
Database Storage with deduplication ClickHouse Cloud
Monitoring Dead Man's Switch Healthchecks.io

Related Projects

Documentation

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gapless_network_data-4.7.0.tar.gz (955.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gapless_network_data-4.7.0-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file gapless_network_data-4.7.0.tar.gz.

File metadata

  • Download URL: gapless_network_data-4.7.0.tar.gz
  • Upload date:
  • Size: 955.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.14 {"installer":{"name":"uv","version":"0.9.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for gapless_network_data-4.7.0.tar.gz
Algorithm Hash digest
SHA256 b0dc41df1ca6e38974fae6bf419143dac7fb4e18497ab8c0cff4e1c81f03d009
MD5 77514e0f70360cdba0b06585c288af25
BLAKE2b-256 f63271f161cf7b8e3333fa2075ecd296a7d20c7e892d1e68fc259c93367db909

See more details on using hashes here.

File details

Details for the file gapless_network_data-4.7.0-py3-none-any.whl.

File metadata

  • Download URL: gapless_network_data-4.7.0-py3-none-any.whl
  • Upload date:
  • Size: 20.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.14 {"installer":{"name":"uv","version":"0.9.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for gapless_network_data-4.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 743ace6e0ccf07273d7562518bcb9cc65862b722e3e359f8c1c302950f4201cc
MD5 3e0b717cca2073141f674a4b5bf68375
BLAKE2b-256 ed94a3389168f4bf840ab5f1b6e654f3e4ab0a044d5000947cfcca8ade7c2d0c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page