Python SDK for accessing Solana blockchain data from solarchive.org - free Parquet datasets of transactions, accounts, and tokens
Project description
solarchive
Python SDK for accessing Solana blockchain data from solarchive.org.
solarchive is a project to archive Solana's public transaction data and make it freely accessible in ergonomic formats (Apache Parquet) for developers, researchers, and the entire Solana community.
Features
- Download Solana transaction, account, and token data in Parquet format
- Access historical data from 2020 to present
- No API keys or rate limits - direct HTTP access to data files
- Query data with DuckDB, pandas, Spark, or any Parquet-compatible tool
- Browse available datasets via index files
- Licensed under CC-BY-4.0
Installation
pip install solarchive
Usage
from solarchive import SolArchive
# Initialize the client
client = SolArchive()
# List available transaction dates
dates = client.list_transaction_dates()
print(f"Available dates: {dates[:5]}...")
# Get index for a specific date
index = client.get_transaction_index("2025-11-01")
print(f"Files available: {len(index['files'])}")
# Download transaction data for a date
# Downloads all Parquet files for that date
client.download_transactions("2025-11-01", output_dir="./data/txs")
# Get account snapshots for a month
client.download_accounts("2025-12", output_dir="./data/accounts")
# Get token snapshots for a month
client.download_tokens("2024-09", output_dir="./data/tokens")
Data Schema
The archive contains three main datasets:
-
Transactions - All non-vote transactions with signatures, fees, account changes, and token balances
- Schema:
https://data.solarchive.org/schemas/solana/transactions.json - Partitioned by day:
txs/YYYY-MM-DD/*.parquet
- Schema:
-
Accounts - Periodic snapshots of account states including balances and program data
- Schema:
https://data.solarchive.org/schemas/solana/accounts.json - Partitioned by month:
accounts/YYYY-MM/*.parquet
- Schema:
-
Tokens - Metadata snapshots for fungible and non-fungible tokens
- Schema:
https://data.solarchive.org/schemas/solana/tokens.json - Partitioned by month:
tokens/YYYY-MM/*.parquet
- Schema:
Development
This package requires Python 3.11 or higher.
# Install dependencies
pip install -e .
# Run the example
python main.py
About solarchive.org
Visit solarchive.org to explore the data directly in your browser using DuckDB-WASM, or to support the project. Hosting hundreds of terabytes costs nearly $10,000/year!
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file solarchive-0.0.1.tar.gz.
File metadata
- Download URL: solarchive-0.0.1.tar.gz
- Upload date:
- Size: 3.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.14 {"installer":{"name":"uv","version":"0.9.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
547e0639cefabdb59321f0698d99e7dd2b9fe3a1c09870f805ceaec5947de92d
|
|
| MD5 |
f6dd23115f4797f8ede3d77c893beb32
|
|
| BLAKE2b-256 |
e126f89558e0a69bcfa1b4701228310b2bf149f89e7e0d4800458499bf193be1
|