Skip to main content

Ethereum blockchain network metrics collection infrastructure with zero-gap guarantee (multi-chain support planned)

Project description

Gapless Network Data

Ethereum blockchain network metrics for financial time series forecasting.

Quick Start

import gapless_network_data as gmd

# Fetch latest blocks (recommended for live trading)
df = gmd.fetch_blocks(limit=1000)

# Compute block utilization (alpha feature #2)
df['utilization'] = df['gas_used'] / df['gas_limit']

# Date range query (half-open interval [start, end))
# Returns all blocks from Jan 1 00:00:00 to Jan 31 23:59:59
df = gmd.fetch_blocks(start='2024-01-01', end='2024-02-01')

# Same-day query (returns all blocks on March 13)
df = gmd.fetch_blocks(start='2024-03-13', end='2024-03-13')

Installation

pip install gapless-network-data

AI Agent Discoverability

This package is optimized for Claude Code CLI and AI coding agents.

Probe Module

import gapless_network_data as gmd

# Get ranked features for ML pipelines
features = gmd.probe.get_alpha_features()
for f in features[:5]:
    print(f"{f.rank}. {f.name} ({f.importance})")

# Get protocol era boundaries
eras = gmd.probe.get_protocol_eras()

# Get setup workflow
workflow = gmd.probe.get_setup_workflow()

# Get example code
print(gmd.probe.get_quick_start())

Alpha Feature Rankings

AI agents should prioritize these features for ML pipelines:

Rank Feature Importance Formula
1 base_fee_per_gas critical raw
2 block_utilization critical gas_used / gas_limit
3 transaction_count high raw
4 timestamp high raw
5 number high raw
6 size medium raw
7 blob_gas_used medium raw (post-EIP4844)
8 excess_blob_gas low raw (post-EIP4844)
9 gas_limit low raw
10 gas_used low raw

Get rankings programmatically: gmd.probe.get_alpha_features()

Protocol Era Boundaries

Filter data appropriately based on protocol changes:

  • EIP-1559 (block 12,965,000, Aug 2021): base_fee_per_gas introduced
  • The Merge (block 15,537,394, Sep 2022): difficulty=0 forever
  • EIP-4844 (block 19,426,587, Mar 2024): blob_gas fields introduced

Get eras programmatically: gmd.probe.get_protocol_eras()

API Reference

fetch_blocks()

gmd.fetch_blocks(
    start: str | None = None,     # ISO 8601 date
    end: str | None = None,       # ISO 8601 date
    limit: int | None = None,     # Max blocks
    include_deprecated: bool = False  # Include difficulty fields
) -> pd.DataFrame

Returns pandas DataFrame with columns:

  • timestamp (datetime64[ns, UTC])
  • number (uint64)
  • gas_limit, gas_used, base_fee_per_gas, transaction_count, size (uint64)
  • blob_gas_used, excess_blob_gas (Int64, nullable - pd.NA for pre-EIP4844)

Deprecated Fields

Excluded by default (use include_deprecated=True for pre-Merge analysis):

  • difficulty: Always 0 post-Merge (Sep 2022)
  • total_difficulty: Frozen post-Merge

Setup

Credentials via .env file (simplest), Doppler (recommended for teams), or environment variables.

Environment Variables

Variable Description
CLICKHOUSE_HOST_READONLY ClickHouse Cloud hostname
CLICKHOUSE_USER_READONLY Read-only username
CLICKHOUSE_PASSWORD_READONLY Password
# Option 1: .env file (simplest for small teams)
# Create .env in your project root:
CLICKHOUSE_HOST_READONLY=<host>
CLICKHOUSE_USER_READONLY=<user>
CLICKHOUSE_PASSWORD_READONLY=<password>

# Option 2: Doppler (recommended for production)
doppler configure set token <token_from_1password>
doppler setup --project gapless-network-data --config prd

# Option 3: Environment variables
export CLICKHOUSE_HOST_READONLY=<host>
export CLICKHOUSE_USER_READONLY=<user>
export CLICKHOUSE_PASSWORD_READONLY=<password>

Get setup instructions: gmd.probe.get_setup_workflow()

Data Coverage

  • Blocks: 23.87M Ethereum blocks (2015-2025)
  • Update frequency: Real-time (~12 second intervals)
  • Storage: ClickHouse Cloud (AWS)
  • Deduplication: Automatic via ReplacingMergeTree

Exceptions

All exceptions include structured context (timestamp, endpoint, HTTP status):

  • CredentialException: Credential resolution failed
  • DatabaseException: ClickHouse query failed
  • MempoolException: Base exception class

Feature Engineering Integration

Combine with OHLCV price data:

import gapless_crypto_data as gcd
import gapless_network_data as gmd

# Fetch both data sources
df_ohlcv = gcd.get_data(symbol="ETHUSDT", timeframe="1m", start_date="2024-01-01")
df_blocks = gmd.fetch_blocks(start="2024-01-01", end="2024-01-02")

# Temporal alignment (forward-fill prevents data leakage)
df_blocks_aligned = df_blocks.set_index('timestamp').reindex(
    df_ohlcv.index, method='ffill'
)

# Join and engineer features
df = df_ohlcv.join(df_blocks_aligned)
df['gas_pressure'] = df['base_fee_per_gas'] / df['base_fee_per_gas'].rolling(60).median()
df['block_utilization'] = df['gas_used'] / df['gas_limit']

Infrastructure (Reference)

Dual-pipeline architecture for production reliability:

Component Purpose Technology
BigQuery Sync Hourly batch from public dataset Cloud Run Job
Real-Time Collector Block-level streaming e2-micro VM
Database Storage with deduplication ClickHouse Cloud
Monitoring Dead Man's Switch Healthchecks.io

Related Projects

Documentation

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gapless_network_data-4.8.0.tar.gz (983.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gapless_network_data-4.8.0-py3-none-any.whl (35.2 kB view details)

Uploaded Python 3

File details

Details for the file gapless_network_data-4.8.0.tar.gz.

File metadata

  • Download URL: gapless_network_data-4.8.0.tar.gz
  • Upload date:
  • Size: 983.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.15 {"installer":{"name":"uv","version":"0.9.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for gapless_network_data-4.8.0.tar.gz
Algorithm Hash digest
SHA256 caeb96eb4fd5d9499435f896deecf7b7d83965533933cb55286c49bdca55920d
MD5 7c21c5b82457cd458266280ccf12db3b
BLAKE2b-256 8028f69eff150fc5db765f9e8b66f6773b3457512c012180ebe2288394e6e2f4

See more details on using hashes here.

File details

Details for the file gapless_network_data-4.8.0-py3-none-any.whl.

File metadata

  • Download URL: gapless_network_data-4.8.0-py3-none-any.whl
  • Upload date:
  • Size: 35.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.15 {"installer":{"name":"uv","version":"0.9.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for gapless_network_data-4.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1e4eb497719ad99d74315b3c4c61dfb03ef240857246f942b611ba1096a228a4
MD5 68010f8cc39dd19dd0147b6b742b2621
BLAKE2b-256 fd55dfc58f493633fb12fb8ebe986fc8158416e36e90670892870a9021fbc7b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page