A Python client and CLI for the Metabase API with automatic chunking for large dataset exports
Project description
PyMetabase
A Python library for exporting data from Metabase with automatic chunking for large datasets.
Features
- Automatic Chunking: Uses ROW_NUMBER() for efficient export of millions of rows
- Multiple Formats: Export to JSONL, JSON, or CSV
- Resume Support: Checkpoint files allow resuming interrupted exports
- Retry Logic: Automatic retry with exponential backoff
- CLI Tool: Command-line interface for quick exports
- SQL Parsing: Uses sqlglot for proper handling of CTEs and complex queries
- Flexible Configuration: Environment variables, config files, or direct parameters
Installation
pip install pymetabase
Quick Start
Python API
from pymetabase import Metabase
# Connect to Metabase
mb = Metabase(
url="https://metabase.example.com",
username="user@example.com",
password="password"
)
# Export query results
with mb:
result = mb.export(
database="MyDatabase",
query="SELECT * FROM users",
output="users.jsonl"
)
print(f"Exported {result.total_rows:,} rows in {result.duration_seconds:.1f}s")
Using Credentials File
from pymetabase import Metabase
mb = Metabase(credentials_file="credentials.json")
with mb:
result = mb.export(
database="Club",
query="SELECT * FROM auth_level",
output="export.jsonl"
)
Command Line
# Export a query
pymetabase export \
-c credentials.json \
--database Club \
--query "SELECT * FROM users" \
-o users.jsonl
# Export with SQL from file (supports multi-line queries)
pymetabase export \
-c credentials.json \
--database Club \
--query-file query.sql \
-o users.jsonl
# Export with custom chunk size
pymetabase export \
-c credentials.json \
--database Club \
--query "SELECT * FROM users" \
-o users.jsonl \
--chunk-size 100000
# Export a table
pymetabase export-table \
-c credentials.json \
--database Club \
--table users \
--where "active = true" \
-o active_users.csv
# Export table with custom chunk size
pymetabase export-table \
-c credentials.json \
--database Club \
--table users \
-o users.csv \
--chunk-size 50000
# List databases
pymetabase list-databases -c credentials.json
# List tables
pymetabase list-tables -c credentials.json --database Club
Remote Management (rclone-style)
PyMetabase supports rclone-style remote configuration for managing multiple Metabase servers.
Configure a Remote
# Interactive configuration
pymetabase config
# This opens a menu:
# 1) Add new remote
# 2) Edit existing remote
# 3) Delete remote
List Remotes
pymetabase listremotes
# Output:
# Configured remotes (2):
# - production (default): https://metabase.example.com
# - staging: https://staging.metabase.com
Use a Remote
# Use specific remote
pymetabase -r production export -d Club -q "SELECT * FROM users" -o users.jsonl
# Use default remote (no -r needed)
pymetabase export -d Club -q "SELECT * FROM users" -o users.jsonl
Select Default Remote
pymetabase selectremote
# Interactive selection:
# Select default remote:
# 1) production (current)
# 2) staging
# Enter number:
Test Connection
# Test specific remote
pymetabase testremote production
# Test default remote
pymetabase testremote
Show/Delete Remote
# Show remote config (password hidden)
pymetabase showremote production
# Delete remote
pymetabase deleteremote staging
First-time Setup
When no remotes are configured and you run an export command, PyMetabase will prompt you to set up a remote:
$ pymetabase export -d Club -q "SELECT * FROM users" -o users.jsonl
No remotes configured. Set up now? [Y/n] y
Remote name: production
Server URL: https://metabase.example.com
Username: user@example.com
Password: ****
Set as default? [Y/n] y
Remote 'production' created and set as default.
Config File Location
Remotes are stored in:
- macOS/Linux:
~/.config/pymetabase/config.json - Windows:
%APPDATA%\pymetabase\config.json
Using Remotes in Python
from pymetabase import Metabase, RemoteManager
# Load from configured remote
manager = RemoteManager()
remote = manager.get_remote("production")
mb = Metabase(
url=remote.url,
username=remote.username,
password=remote.password
)
with mb:
result = mb.export(
database="Club",
query="SELECT * FROM users",
output="users.jsonl"
)
Configuration
Credentials File (credentials.json)
[
{
"SERVER_NAME": "https://metabase.example.com",
"USERNAME": "user@example.com",
"PASSWORD": "password"
}
]
Config File (config.yaml)
metabase:
url: https://metabase.example.com
username: user@example.com
password: password
defaults:
chunk_size: 500000
auto_chunk_threshold: 1000000
format: jsonl
retry:
max_retries: 3
delay: 1.0
timeout: 600
logging:
level: INFO
file: export.log
Environment Variables
export METABASE_URL=https://metabase.example.com
export METABASE_USERNAME=user@example.com
export METABASE_PASSWORD=password
export METABASE_CHUNK_SIZE=500000
Advanced Usage
SQL Query Files
You can store complex, multi-line SQL queries in .sql or .txt files and reference them in your exports:
query.sql:
-- Complex query with CTEs
WITH active_users AS (
SELECT
id,
name,
email,
created_at
FROM users
WHERE active = true
),
recent_orders AS (
SELECT
user_id,
COUNT(*) as order_count,
SUM(total) as total_spent
FROM orders
WHERE created_at >= '2024-01-01'
GROUP BY user_id
)
SELECT
u.id,
u.name,
u.email,
COALESCE(o.order_count, 0) as orders,
COALESCE(o.total_spent, 0) as spent
FROM active_users u
LEFT JOIN recent_orders o ON u.id = o.user_id
ORDER BY o.total_spent DESC
Usage:
# Use -Q or --query-file
pymetabase export -d Club -Q query.sql -o results.jsonl
# Works with remotes too
pymetabase -r production export -d Club --query-file query.sql -o results.jsonl
Custom Chunk Sizes
Control the number of rows per chunk for optimal performance:
# Small chunks for memory-constrained environments
pymetabase export -d Club -q "SELECT * FROM big_table" -o data.jsonl --chunk-size 50000
# Large chunks for better performance on fast networks
pymetabase export -d Club -q "SELECT * FROM users" -o data.jsonl --chunk-size 1000000
In Python:
with mb:
result = mb.export(
database="Club",
query="SELECT * FROM big_table",
output="data.jsonl",
chunk_size=100000 # Custom chunk size
)
Export with Progress Callback
def progress(current, total, rate):
pct = current / total * 100
print(f"Progress: {pct:.1f}% ({current:,}/{total:,}) - {rate:.0f} rows/sec")
with mb:
result = mb.export(
database="Club",
query="SELECT * FROM large_table",
output="export.jsonl",
progress_callback=progress
)
Resume Interrupted Export
with mb:
result = mb.export(
database="Club",
query="SELECT * FROM large_table",
output="export.jsonl",
checkpoint_file="export_progress.json"
)
If the export is interrupted, run the same command again and it will resume from where it left off.
Export Table with Filtering
with mb:
result = mb.export_table(
database="Club",
table="users",
output="active_users.csv",
columns=["id", "name", "email"],
where="active = true AND created_at > '2024-01-01'",
order_by="created_at DESC",
limit=10000
)
Complex Queries (CTEs)
PyMetabase uses sqlglot to properly handle complex queries:
query = """
WITH active_users AS (
SELECT * FROM users WHERE active = true
),
user_orders AS (
SELECT user_id, COUNT(*) as order_count
FROM orders
GROUP BY user_id
)
SELECT u.*, o.order_count
FROM active_users u
LEFT JOIN user_orders o ON u.id = o.user_id
"""
with mb:
result = mb.export(
database="MyDB",
query=query,
output="user_orders.jsonl"
)
Output Formats
JSONL (JSON Lines) - Default
Each row is a separate JSON object on its own line:
{"id": 1, "name": "Alice", "email": "alice@example.com"}
{"id": 2, "name": "Bob", "email": "bob@example.com"}
JSON
Standard JSON array:
[
{"id": 1, "name": "Alice", "email": "alice@example.com"},
{"id": 2, "name": "Bob", "email": "bob@example.com"}
]
CSV
Standard CSV with headers:
id,name,email
1,Alice,alice@example.com
2,Bob,bob@example.com
How It Works
Automatic Chunking
For large datasets (>1M rows by default), PyMetabase automatically chunks the export using ROW_NUMBER():
WITH _numbered AS (
SELECT *, ROW_NUMBER() OVER () AS _row_num
FROM (your_query) AS _base
)
SELECT * FROM _numbered
WHERE _row_num >= 1 AND _row_num <= 500000
This approach:
- Works on ANY query (CTEs, JOINs, UNIONs)
- No special column required
- Even chunk sizes
- No missing or duplicate rows
Retry Logic
Failed chunks are automatically retried with exponential backoff:
- Attempt 1: Immediate
- Attempt 2: Wait 1 second
- Attempt 3: Wait 2 seconds
API Reference
Metabase Class
Metabase(
url: str = None,
username: str = None,
password: str = None,
config_file: str = None,
credentials_file: str = None,
**kwargs
)
Methods
connect()- Connect and authenticatedisconnect()- Disconnect from Metabaselist_databases()- List available databaseslist_tables(database)- List tables in a databaseexport(query, output, ...)- Export query resultsexport_table(table, output, ...)- Export entire table
ExportResult
@dataclass
class ExportResult:
total_rows: int
chunks: int
duration_seconds: float
output_file: str
rate_per_second: float
format: str
Requirements
- Python 3.8+
- requests
- sqlglot
- pyyaml
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pymetabase-0.1.0.tar.gz.
File metadata
- Download URL: pymetabase-0.1.0.tar.gz
- Upload date:
- Size: 122.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f4c32b2bde168293179472a649d4e4103123551a4de07479e7d38009c64e1ad3
|
|
| MD5 |
22f2635d23e4deeb029eed18779ad18c
|
|
| BLAKE2b-256 |
fee41cc35876d5dc377d3ba9075c31a4e121a68ac049a26a3b56d89e5a37cdd0
|
File details
Details for the file pymetabase-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pymetabase-0.1.0-py3-none-any.whl
- Upload date:
- Size: 31.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b90908007e72625c99497aa95e4c05b808052d6897bd4c354ed6ccdde30d4b9
|
|
| MD5 |
63072b37d396eb6a9161ddfee9a5b48e
|
|
| BLAKE2b-256 |
43254f45884ae89be9f65058575c09ca359be6fde73f3ac9e6ea73745320fa90
|