Skip to main content

Ethereum blockchain network metrics collection infrastructure with zero-gap guarantee (multi-chain support planned)

Project description

Gapless Network Data

Ethereum blockchain network metrics collection infrastructure with dual-pipeline architecture.

Installation

pip install gapless-network-data

Overview

Collects Ethereum block-level network metrics with real-time updates at block intervals (approximately 12 seconds). Data stored in ClickHouse Cloud with automatic deduplication.

Architecture: BigQuery hourly batch + Alchemy real-time WebSocket

Data Range: Genesis block (2015) to present

Cost: All components operate within free tier limits

Data Access

import clickhouse_connect

client = clickhouse_connect.get_client(
    host='your-host.aws.clickhouse.cloud',
    port=8443,
    username='default',
    password=password,
    secure=True
)

# Query latest blocks
result = client.query_df("""
    SELECT
        timestamp,
        number,
        base_fee_per_gas,
        gas_used,
        gas_limit,
        transaction_count
    FROM ethereum_mainnet.blocks FINAL
    ORDER BY number DESC
    LIMIT 10
""")

Schema

11 columns for ML feature engineering:

Column Type Description
timestamp TIMESTAMP UTC timestamp
number BIGINT Block number (PRIMARY KEY)
gas_limit BIGINT Block gas limit
gas_used BIGINT Total gas used
base_fee_per_gas BIGINT EIP-1559 base fee (wei)
transaction_count BIGINT Number of transactions
difficulty HUGEINT Mining/staking difficulty
total_difficulty HUGEINT Cumulative chain work
size BIGINT Block size (bytes)
blob_gas_used BIGINT EIP-4844 blob gas used
excess_blob_gas BIGINT EIP-4844 excess blob gas

Schema excludes cryptographic hashes and Merkle roots (non-predictive fields).

Feature Engineering

Combine with OHLCV price data for cross-domain features:

import clickhouse_connect
import gapless_crypto_data as gcd
import pandas as pd

# Fetch OHLCV data
df_ohlcv = gcd.get_data(
    symbol="ETHUSDT",
    timeframe="1m",
    start_date="2024-01-01",
    end_date="2024-01-02"
)

# Query Ethereum blocks
client = clickhouse_connect.get_client(...)
df_eth = client.query_df("""
    SELECT timestamp, base_fee_per_gas, gas_used, gas_limit, transaction_count
    FROM ethereum_mainnet.blocks FINAL
    WHERE timestamp BETWEEN '2024-01-01' AND '2024-01-02'
""")

# Temporal alignment (forward-fill prevents data leakage)
df_eth['timestamp'] = pd.to_datetime(df_eth['timestamp'])
df_eth.set_index('timestamp', inplace=True)
df_eth_aligned = df_eth.reindex(df_ohlcv.index, method='ffill')

# Join and engineer features
df = df_ohlcv.join(df_eth_aligned)
df['gas_pressure'] = df['base_fee_per_gas'] / df['base_fee_per_gas'].rolling(60).median()
df['block_utilization'] = (df['gas_used'] / df['gas_limit']) * 100

Infrastructure

Pipeline Components

Component Purpose Technology
BigQuery Sync Hourly batch from public dataset Cloud Run Job
Real-Time Collector Block-level streaming e2-micro VM
Database Storage with deduplication ClickHouse Cloud
Monitoring Dead Man's Switch Healthchecks.io

Deployment Structure

deployment/
├── cloud-run/       # BigQuery sync job
├── vm/              # Real-time collector
└── backfill/        # Historical data loading

Operations

# Verify pipeline health
gcloud run jobs executions list --job eth-md-updater --region us-central1

# View real-time collector logs
gcloud compute ssh eth-realtime-collector --zone=us-east1-b \
  --command='sudo journalctl -u eth-collector -f'

# Verify database state
uv run scripts/clickhouse/verify_blocks.py

Data Sources

Source Purpose Method
BigQuery Historical blocks Hourly sync
Alchemy Real-time blocks WebSocket

Related Projects

Documentation

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gapless_network_data-2.4.0.tar.gz (973.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gapless_network_data-2.4.0-py3-none-any.whl (17.1 kB view details)

Uploaded Python 3

File details

Details for the file gapless_network_data-2.4.0.tar.gz.

File metadata

  • Download URL: gapless_network_data-2.4.0.tar.gz
  • Upload date:
  • Size: 973.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.12 {"installer":{"name":"uv","version":"0.9.12"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for gapless_network_data-2.4.0.tar.gz
Algorithm Hash digest
SHA256 63c05a47c6a1a7e8cec44b8f5a257821ffef0abef7d2aec146f6c01befcb646c
MD5 04d89dd46ce43b32626a4a14719614c9
BLAKE2b-256 d3359a22793e5df751b85c603c10a211ee23c39151a1b6d62b8f2bdb933e90c6

See more details on using hashes here.

File details

Details for the file gapless_network_data-2.4.0-py3-none-any.whl.

File metadata

  • Download URL: gapless_network_data-2.4.0-py3-none-any.whl
  • Upload date:
  • Size: 17.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.12 {"installer":{"name":"uv","version":"0.9.12"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for gapless_network_data-2.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c2fa5702770f5425cfcc0646f31f887a36bae47db7a92bbc55a46b828679dd4f
MD5 bd862ecd3577854591caa0fa075f8def
BLAKE2b-256 2568eaaa87a5d1a2fbe18fa95deb8a30c90d231ca6d85f42d89223da34312214

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page