Skip to main content

A Python client and CLI for the Metabase API with automatic chunking for large dataset exports

Project description

PyMetabase

A Python library for exporting data from Metabase with automatic chunking for large datasets.

Features

  • Automatic Chunking: Uses ROW_NUMBER() for efficient export of millions of rows
  • Multiple Formats: Export to JSONL, JSON, or CSV
  • Resume Support: Checkpoint files allow resuming interrupted exports
  • Retry Logic: Automatic retry with exponential backoff
  • CLI Tool: Command-line interface for quick exports
  • SQL Parsing: Uses sqlglot for proper handling of CTEs and complex queries
  • Flexible Configuration: Environment variables, config files, or direct parameters

Installation

pip install pymetabase

Quick Start

Python API

from pymetabase import Metabase

# Connect to Metabase
mb = Metabase(
    url="https://metabase.example.com",
    username="user@example.com",
    password="password"
)

# Export query results
with mb:
    result = mb.export(
        database="MyDatabase",
        query="SELECT * FROM users",
        output="users.jsonl"
    )

print(f"Exported {result.total_rows:,} rows in {result.duration_seconds:.1f}s")

Using Credentials File

from pymetabase import Metabase

mb = Metabase(credentials_file="credentials.json")

with mb:
    result = mb.export(
        database="Club",
        query="SELECT * FROM auth_level",
        output="export.jsonl"
    )

Command Line

# Export a query
pymetabase export \
    -c credentials.json \
    --database Club \
    --query "SELECT * FROM users" \
    -o users.jsonl

# Export with SQL from file (supports multi-line queries)
pymetabase export \
    -c credentials.json \
    --database Club \
    --query-file query.sql \
    -o users.jsonl

# Export with custom chunk size
pymetabase export \
    -c credentials.json \
    --database Club \
    --query "SELECT * FROM users" \
    -o users.jsonl \
    --chunk-size 100000

# Export a table
pymetabase export-table \
    -c credentials.json \
    --database Club \
    --table users \
    --where "active = true" \
    -o active_users.csv

# Export table with custom chunk size
pymetabase export-table \
    -c credentials.json \
    --database Club \
    --table users \
    -o users.csv \
    --chunk-size 50000

# List databases
pymetabase list-databases -c credentials.json

# List tables
pymetabase list-tables -c credentials.json --database Club

Remote Management (rclone-style)

PyMetabase supports rclone-style remote configuration for managing multiple Metabase servers.

Configure a Remote

# Interactive configuration
pymetabase config

# This opens a menu:
#   1) Add new remote
#   2) Edit existing remote
#   3) Delete remote

List Remotes

pymetabase listremotes

# Output:
# Configured remotes (2):
#   - production (default): https://metabase.example.com
#   - staging: https://staging.metabase.com

Use a Remote

# Use specific remote
pymetabase -r production export -d Club -q "SELECT * FROM users" -o users.jsonl

# Use default remote (no -r needed)
pymetabase export -d Club -q "SELECT * FROM users" -o users.jsonl

Select Default Remote

pymetabase selectremote

# Interactive selection:
# Select default remote:
#   1) production (current)
#   2) staging
# Enter number:

Test Connection

# Test specific remote
pymetabase testremote production

# Test default remote
pymetabase testremote

Show/Delete Remote

# Show remote config (password hidden)
pymetabase showremote production

# Delete remote
pymetabase deleteremote staging

First-time Setup

When no remotes are configured and you run an export command, PyMetabase will prompt you to set up a remote:

$ pymetabase export -d Club -q "SELECT * FROM users" -o users.jsonl
No remotes configured. Set up now? [Y/n] y
Remote name: production
Server URL: https://metabase.example.com
Username: user@example.com
Password: ****
Set as default? [Y/n] y
Remote 'production' created and set as default.

Config File Location

Remotes are stored in:

  • macOS/Linux: ~/.config/pymetabase/config.json
  • Windows: %APPDATA%\pymetabase\config.json

Using Remotes in Python

from pymetabase import Metabase, RemoteManager

# Load from configured remote
manager = RemoteManager()
remote = manager.get_remote("production")

mb = Metabase(
    url=remote.url,
    username=remote.username,
    password=remote.password
)

with mb:
    result = mb.export(
        database="Club",
        query="SELECT * FROM users",
        output="users.jsonl"
    )

Configuration

Credentials File (credentials.json)

[
  {
    "SERVER_NAME": "https://metabase.example.com",
    "USERNAME": "user@example.com",
    "PASSWORD": "password"
  }
]

Config File (config.yaml)

metabase:
  url: https://metabase.example.com
  username: user@example.com
  password: password

defaults:
  chunk_size: 500000
  auto_chunk_threshold: 1000000
  format: jsonl

retry:
  max_retries: 3
  delay: 1.0
  timeout: 600

logging:
  level: INFO
  file: export.log

Environment Variables

export METABASE_URL=https://metabase.example.com
export METABASE_USERNAME=user@example.com
export METABASE_PASSWORD=password
export METABASE_CHUNK_SIZE=500000

Advanced Usage

SQL Query Files

You can store complex, multi-line SQL queries in .sql or .txt files and reference them in your exports:

query.sql:

-- Complex query with CTEs
WITH active_users AS (
    SELECT
        id,
        name,
        email,
        created_at
    FROM users
    WHERE active = true
),
recent_orders AS (
    SELECT
        user_id,
        COUNT(*) as order_count,
        SUM(total) as total_spent
    FROM orders
    WHERE created_at >= '2024-01-01'
    GROUP BY user_id
)
SELECT
    u.id,
    u.name,
    u.email,
    COALESCE(o.order_count, 0) as orders,
    COALESCE(o.total_spent, 0) as spent
FROM active_users u
LEFT JOIN recent_orders o ON u.id = o.user_id
ORDER BY o.total_spent DESC

Usage:

# Use -Q or --query-file
pymetabase export -d Club -Q query.sql -o results.jsonl

# Works with remotes too
pymetabase -r production export -d Club --query-file query.sql -o results.jsonl

Custom Chunk Sizes

Control the number of rows per chunk for optimal performance:

# Small chunks for memory-constrained environments
pymetabase export -d Club -q "SELECT * FROM big_table" -o data.jsonl --chunk-size 50000

# Large chunks for better performance on fast networks
pymetabase export -d Club -q "SELECT * FROM users" -o data.jsonl --chunk-size 1000000

In Python:

with mb:
    result = mb.export(
        database="Club",
        query="SELECT * FROM big_table",
        output="data.jsonl",
        chunk_size=100000  # Custom chunk size
    )

Export with Progress Callback

def progress(current, total, rate):
    pct = current / total * 100
    print(f"Progress: {pct:.1f}% ({current:,}/{total:,}) - {rate:.0f} rows/sec")

with mb:
    result = mb.export(
        database="Club",
        query="SELECT * FROM large_table",
        output="export.jsonl",
        progress_callback=progress
    )

Resume Interrupted Export

with mb:
    result = mb.export(
        database="Club",
        query="SELECT * FROM large_table",
        output="export.jsonl",
        checkpoint_file="export_progress.json"
    )

If the export is interrupted, run the same command again and it will resume from where it left off.

Export Table with Filtering

with mb:
    result = mb.export_table(
        database="Club",
        table="users",
        output="active_users.csv",
        columns=["id", "name", "email"],
        where="active = true AND created_at > '2024-01-01'",
        order_by="created_at DESC",
        limit=10000
    )

Complex Queries (CTEs)

PyMetabase uses sqlglot to properly handle complex queries:

query = """
WITH active_users AS (
    SELECT * FROM users WHERE active = true
),
user_orders AS (
    SELECT user_id, COUNT(*) as order_count
    FROM orders
    GROUP BY user_id
)
SELECT u.*, o.order_count
FROM active_users u
LEFT JOIN user_orders o ON u.id = o.user_id
"""

with mb:
    result = mb.export(
        database="MyDB",
        query=query,
        output="user_orders.jsonl"
    )

Output Formats

JSONL (JSON Lines) - Default

Each row is a separate JSON object on its own line:

{"id": 1, "name": "Alice", "email": "alice@example.com"}
{"id": 2, "name": "Bob", "email": "bob@example.com"}

JSON

Standard JSON array:

[
  {"id": 1, "name": "Alice", "email": "alice@example.com"},
  {"id": 2, "name": "Bob", "email": "bob@example.com"}
]

CSV

Standard CSV with headers:

id,name,email
1,Alice,alice@example.com
2,Bob,bob@example.com

How It Works

Automatic Chunking

For large datasets (>1M rows by default), PyMetabase automatically chunks the export using ROW_NUMBER():

WITH _numbered AS (
    SELECT *, ROW_NUMBER() OVER () AS _row_num
    FROM (your_query) AS _base
)
SELECT * FROM _numbered
WHERE _row_num >= 1 AND _row_num <= 500000

This approach:

  • Works on ANY query (CTEs, JOINs, UNIONs)
  • No special column required
  • Even chunk sizes
  • No missing or duplicate rows

Retry Logic

Failed chunks are automatically retried with exponential backoff:

  • Attempt 1: Immediate
  • Attempt 2: Wait 1 second
  • Attempt 3: Wait 2 seconds

API Reference

Metabase Class

Metabase(
    url: str = None,
    username: str = None,
    password: str = None,
    config_file: str = None,
    credentials_file: str = None,
    **kwargs
)

Methods

  • connect() - Connect and authenticate
  • disconnect() - Disconnect from Metabase
  • list_databases() - List available databases
  • list_tables(database) - List tables in a database
  • export(query, output, ...) - Export query results
  • export_table(table, output, ...) - Export entire table

ExportResult

@dataclass
class ExportResult:
    total_rows: int
    chunks: int
    duration_seconds: float
    output_file: str
    rate_per_second: float
    format: str

Requirements

  • Python 3.8+
  • requests
  • sqlglot
  • pyyaml

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymetabase-0.1.0.tar.gz (122.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pymetabase-0.1.0-py3-none-any.whl (31.6 kB view details)

Uploaded Python 3

File details

Details for the file pymetabase-0.1.0.tar.gz.

File metadata

  • Download URL: pymetabase-0.1.0.tar.gz
  • Upload date:
  • Size: 122.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.11

File hashes

Hashes for pymetabase-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f4c32b2bde168293179472a649d4e4103123551a4de07479e7d38009c64e1ad3
MD5 22f2635d23e4deeb029eed18779ad18c
BLAKE2b-256 fee41cc35876d5dc377d3ba9075c31a4e121a68ac049a26a3b56d89e5a37cdd0

See more details on using hashes here.

File details

Details for the file pymetabase-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pymetabase-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 31.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.11

File hashes

Hashes for pymetabase-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4b90908007e72625c99497aa95e4c05b808052d6897bd4c354ed6ccdde30d4b9
MD5 63072b37d396eb6a9161ddfee9a5b48e
BLAKE2b-256 43254f45884ae89be9f65058575c09ca359be6fde73f3ac9e6ea73745320fa90

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page