Skip to main content

Ethereum blockchain network metrics collection infrastructure with zero-gap guarantee (multi-chain support planned)

Project description

Gapless Network Data

Ethereum blockchain network metrics for financial time series forecasting.

Quick Start

import gapless_network_data as gmd

# Fetch latest blocks (recommended for live trading)
df = gmd.fetch_blocks(limit=1000)

# Compute block utilization (alpha feature #2)
df['utilization'] = df['gas_used'] / df['gas_limit']

# Date range query (inclusive [start, end])
# Returns all blocks from Jan 1 through Jan 31 (both dates included)
df = gmd.fetch_blocks(start='2024-01-01', end='2024-01-31')

# Start-only: all blocks from Jan 1 to most recent
df = gmd.fetch_blocks(start='2024-01-01')

# End-only: all blocks from genesis to Jan 31
df = gmd.fetch_blocks(end='2024-01-31')

# Same-day query (returns all blocks on March 13)
df = gmd.fetch_blocks(start='2024-03-13', end='2024-03-13')

# Second-precision query (explicit times supported)
df = gmd.fetch_blocks(start='2024-03-13 12:00:00', end='2024-03-13 12:05:00')

Installation

pip install gapless-network-data

AI Agent Discoverability

This package is optimized for Claude Code CLI and AI coding agents.

Probe Module

import gapless_network_data as gmd

# Get ranked features for ML pipelines
features = gmd.probe.get_alpha_features()
for f in features[:5]:
    print(f"{f.rank}. {f.name} ({f.importance})")

# Get protocol era boundaries
eras = gmd.probe.get_protocol_eras()

# Get setup workflow
workflow = gmd.probe.get_setup_workflow()

# Get example code
print(gmd.probe.get_quick_start())

Alpha Feature Rankings

AI agents should prioritize these features for ML pipelines:

Rank Feature Importance Formula
1 base_fee_per_gas critical raw
2 block_utilization critical gas_used / gas_limit
3 transaction_count high raw
4 timestamp high raw
5 number high raw
6 size medium raw
7 blob_gas_used medium raw (post-EIP4844)
8 excess_blob_gas low raw (post-EIP4844)
9 gas_limit low raw
10 gas_used low raw

Get rankings programmatically: gmd.probe.get_alpha_features()

Protocol Era Boundaries

Filter data appropriately based on protocol changes:

  • EIP-1559 (block 12,965,000, Aug 2021): base_fee_per_gas introduced
  • The Merge (block 15,537,394, Sep 2022): difficulty=0 forever
  • EIP-4844 (block 19,426,587, Mar 2024): blob_gas fields introduced

Get eras programmatically: gmd.probe.get_protocol_eras()

API Reference

fetch_blocks()

gmd.fetch_blocks(
    start: str | None = None,     # ISO 8601 date (inclusive)
    end: str | None = None,       # ISO 8601 date (inclusive for date-only)
    limit: int | None = None,     # Max blocks (0 = empty DataFrame)
    include_deprecated: bool = False  # Include difficulty fields
) -> pd.DataFrame

Date Range Semantics (inclusive [start, end]):

  • Date-only inputs include the entire day: end='2024-03-13' includes all of March 13
  • Explicit times are preserved: end='2024-03-13 12:00:00' excludes blocks after noon
  • Same-day queries work: start='2024-03-13', end='2024-03-13' returns all blocks on March 13

Parameter Requirements:

  • At least one of start, end, or limit must be specified
  • Empty strings ("") are rejected — use None to omit
  • start must be ≤ end if both provided
  • limit=0 returns empty DataFrame (0 rows, not entire blockchain)

Returns pandas DataFrame with columns:

  • timestamp (datetime64[ns, UTC])
  • number (uint64)
  • gas_limit, gas_used, base_fee_per_gas, transaction_count, size (uint64)
  • blob_gas_used, excess_blob_gas (Int64, nullable - pd.NA for pre-EIP4844)

Deprecated Fields

Excluded by default (use include_deprecated=True for pre-Merge analysis):

  • difficulty: Always 0 post-Merge (Sep 2022)
  • total_difficulty: Frozen post-Merge

Setup

Credentials via .env file (simplest), Doppler (recommended for teams), or environment variables.

Environment Variables

Variable Description
CLICKHOUSE_HOST_READONLY ClickHouse Cloud hostname
CLICKHOUSE_USER_READONLY Read-only username
CLICKHOUSE_PASSWORD_READONLY Password
# Option 1: .env file (simplest for small teams)
# Create .env in your project root:
CLICKHOUSE_HOST_READONLY=<host>
CLICKHOUSE_USER_READONLY=<user>
CLICKHOUSE_PASSWORD_READONLY=<password>

# Option 2: Doppler (recommended for production)
doppler configure set token <token_from_1password>
doppler setup --project gapless-network-data --config prd

# Option 3: Environment variables
export CLICKHOUSE_HOST_READONLY=<host>
export CLICKHOUSE_USER_READONLY=<user>
export CLICKHOUSE_PASSWORD_READONLY=<password>

Get setup instructions: gmd.probe.get_setup_workflow()

Time Precision

  • Timestamp storage: Millisecond precision (DateTime64(3))
  • Block granularity: ~12 second intervals (Ethereum block time)
  • Query precision: Second-level supported for start/end parameters

Supported timestamp formats:

Format Example Behavior
Date-only '2024-03-13' Expands to include full day
Date + time '2024-03-13 12:30:45' Preserved exactly
ISO 8601 '2024-03-13T12:30:45' Preserved exactly
With milliseconds '2024-03-13 12:30:45.123' Preserved (truncated to 3 digits)

Data Coverage

  • Blocks: 23.87M Ethereum blocks (2015-2025)
  • Update frequency: Real-time (~12 second intervals)
  • Storage: ClickHouse Cloud (AWS)
  • Deduplication: Automatic via ReplacingMergeTree

Exceptions

All exceptions include structured context (timestamp, endpoint, HTTP status):

Credential & Database:

  • CredentialException: Credential resolution failed
  • DatabaseException: ClickHouse query failed

Parameter Validation (fetch_blocks):

  • ValueError: Empty string for start/end (use None to omit)
  • ValueError: No parameters specified (must have start, end, or limit)
  • ValueError: Reversed date range (start > end)

Feature Engineering Integration

Combine with OHLCV price data:

import gapless_crypto_data as gcd
import gapless_network_data as gmd

# Fetch both data sources
df_ohlcv = gcd.get_data(symbol="ETHUSDT", timeframe="1m", start_date="2024-01-01")
df_blocks = gmd.fetch_blocks(start="2024-01-01", end="2024-01-02")

# Temporal alignment (forward-fill prevents data leakage)
df_blocks_aligned = df_blocks.set_index('timestamp').reindex(
    df_ohlcv.index, method='ffill'
)

# Join and engineer features
df = df_ohlcv.join(df_blocks_aligned)
df['gas_pressure'] = df['base_fee_per_gas'] / df['base_fee_per_gas'].rolling(60).median()
df['block_utilization'] = df['gas_used'] / df['gas_limit']

Infrastructure (Reference)

Dual-pipeline architecture for production reliability:

Component Purpose Technology
BigQuery Sync Hourly batch from public dataset Cloud Run Job
Real-Time Collector Block-level streaming e2-micro VM
Database Storage with deduplication ClickHouse Cloud
Monitoring Dead Man's Switch Healthchecks.io

Related Projects

Documentation

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gapless_network_data-5.1.0.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gapless_network_data-5.1.0-py3-none-any.whl (37.2 kB view details)

Uploaded Python 3

File details

Details for the file gapless_network_data-5.1.0.tar.gz.

File metadata

  • Download URL: gapless_network_data-5.1.0.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for gapless_network_data-5.1.0.tar.gz
Algorithm Hash digest
SHA256 4485e25d01c404b90b89a2b52d7af6b33fa407a0931c07b42170bd12fa73b8e8
MD5 7d1f932d692be47c1661a12394ee9dd3
BLAKE2b-256 bd1a1b2014f7b8844218f84f3bba03a6b52756ff327456022f62b757418dac6c

See more details on using hashes here.

File details

Details for the file gapless_network_data-5.1.0-py3-none-any.whl.

File metadata

  • Download URL: gapless_network_data-5.1.0-py3-none-any.whl
  • Upload date:
  • Size: 37.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for gapless_network_data-5.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 88d6d305bee7f0a1edcbe3786a934688baff1e99f071d49396820c7a55187b7a
MD5 7ff5962f57a2f9519d0fb872b0cc0d37
BLAKE2b-256 f00faadb6f2fb77890696769d5daa236b9dc48a28aa7e8c4d05d55eb44231e84

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page