SQLAlchemy dialect for Arc time-series database with Apache Arrow support for Apache Superset

These details have not been verified by PyPI

Project links

Project description

Arc + Apache Superset Integration (Arrow Edition)

This is the Apache Arrow-powered SQLAlchemy dialect for integrating Apache Superset with the Arc time-series data warehouse. This version uses Arrow IPC for dramatically faster query performance compared to JSON-based queries.

Features

✅ Custom SQLAlchemy dialect for Arc API with Apache Arrow IPC
✅ 28-75% faster queries compared to JSON format (depending on dataset size)
✅ Lower memory overhead with zero-copy Arrow buffers
✅ Authentication via API keys (set and forget)
✅ Full SQL support for dashboard creation
✅ Auto-discovery of Arc tables and measurements
✅ Docker Compose integration

Why Arrow?

This dialect uses Apache Arrow IPC (Inter-Process Communication) instead of JSON for query results, providing:

Performance: 28-75% faster query execution
Efficiency: Zero-copy data transfer with columnar format
Type Safety: Preserve native data types (timestamps, integers, floats)
Scalability: Better performance for large result sets

Quick Start

Option 1: Install in Existing Superset

# Install the Arc Arrow dialect package
pip install arc-superset-arrow

# Restart Superset
superset run -h 0.0.0.0 -p 8088

Option 2: Docker Image (Recommended for Production)

Use the included Dockerfile to build a Superset image with Arc Arrow dialect pre-installed:

# Build the image
docker build -f Dockerfile.superset -t superset-arc-arrow:latest .

# Run Superset with Arc Arrow support
docker run -d \
  -p 8088:8088 \
  -v superset_home:/app/superset_home \
  --name superset-arc \
  superset-arc-arrow:latest

# Check logs
docker logs -f superset-arc

The Dockerfile includes:

Arc Arrow dialect pre-installed in Superset's venv
Pip properly configured in the virtual environment
Uses Superset's default initialization

Connect to Arc

Access Superset: http://localhost:8088 (admin/admin)
Create Arc Connection:
- Go to Settings → Database Connections
- Click + Database
- Select Other as database type
- Use this connection string:
```
arc+arrow://YOUR_API_KEY@arc-api:8000/default
```
Replace YOUR_API_KEY with your Arc token (see below)

Getting Your API Key

Arc creates an initial admin token on first startup. Check the logs:

# Docker
docker logs arc-api | grep "Initial admin token"

# Native/systemd
journalctl -u arc | grep "Initial admin token"

Or create a new token using the Arc CLI or Python:

# Using Python directly
DB_PATH="./data/arc.db" python3 -c "
from api.auth import AuthManager
auth = AuthManager(db_path='./data/arc.db')
token = auth.create_token('superset-integration', description='Superset dashboard access')
print(f'Token: {token}')
"

Save the returned token - it's only shown once!

Multi-Database Support

Arc supports multiple databases (namespaces) within a single instance. In Superset, databases are exposed as schemas:

Database Structure

Schema: default
  ├── cpu (CPU metrics)
  ├── mem (Memory metrics)
  └── disk (Disk metrics)

Schema: production
  ├── cpu
  ├── mem
  └── disk

Schema: staging
  ├── cpu
  ├── mem
  └── disk

Using Databases in Superset

View All Databases (Schemas):
- When creating a dataset, select the schema (database) from the dropdown
- Each database appears as a separate schema in Superset

Query Specific Database:

-- Query default database
SELECT * FROM cpu WHERE timestamp > NOW() - INTERVAL 1 HOUR

-- Query specific database
SELECT * FROM production.cpu WHERE timestamp > NOW() - INTERVAL 1 HOUR

-- Cross-database joins
SELECT
    p.timestamp,
    p.usage_idle as prod_cpu,
    s.usage_idle as staging_cpu
FROM production.cpu p
JOIN staging.cpu s ON p.timestamp = s.timestamp
WHERE p.timestamp > NOW() - INTERVAL 1 HOUR

Available Commands

-- List all databases
SHOW DATABASES;

-- List all tables in current database
SHOW TABLES;

Available Tables

The dialect auto-discovers all tables using SHOW TABLES. Common examples:

cpu - CPU metrics
mem - Memory metrics
disk - Disk metrics
Any custom measurements you've ingested

Example Queries

Basic Queries

-- Recent CPU metrics (use 'timestamp' column)
SELECT timestamp, host, usage_idle, usage_user
FROM cpu
WHERE timestamp > NOW() - INTERVAL 1 HOUR
ORDER BY timestamp DESC
LIMIT 100;

-- Average CPU usage by host
SELECT host, AVG(usage_idle) as avg_idle
FROM cpu
WHERE timestamp > NOW() - INTERVAL 24 HOUR
GROUP BY host
ORDER BY avg_idle DESC;

Time-Series Aggregation

Arc supports DuckDB's powerful time functions:

-- Time bucket aggregation (5-minute intervals)
SELECT
    time_bucket(INTERVAL '5 minutes', timestamp) as bucket,
    host,
    AVG(usage_idle) as avg_idle,
    MAX(usage_user) as max_user
FROM cpu
WHERE timestamp > NOW() - INTERVAL 6 HOUR
GROUP BY bucket, host
ORDER BY bucket DESC;

-- Daily aggregation with DATE_TRUNC
SELECT
    DATE_TRUNC('day', timestamp) as day,
    host,
    AVG(usage_idle) as avg_cpu_idle,
    COUNT(*) as samples
FROM cpu
WHERE timestamp > NOW() - INTERVAL 7 DAY
GROUP BY day, host
ORDER BY day DESC;

Join Queries

Join multiple measurements for correlated analysis:

-- Correlate CPU and Memory usage
SELECT
    c.timestamp,
    c.host,
    c.usage_idle as cpu_idle,
    m.used_percent as mem_used
FROM cpu c
JOIN mem m ON c.timestamp = m.timestamp AND c.host = m.host
WHERE c.timestamp > NOW() - INTERVAL 10 MINUTE
ORDER BY c.timestamp DESC
LIMIT 1000;

Window Functions

-- Moving average over last 6 data points
SELECT
    timestamp,
    host,
    usage_idle,
    AVG(usage_idle) OVER (
        PARTITION BY host
        ORDER BY timestamp
        ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
    ) as moving_avg
FROM cpu
ORDER BY timestamp DESC
LIMIT 100;

Architecture

Superset → Arrow Dialect → Arrow IPC → Arc API → DuckDB → Parquet Files → MinIO/S3
    ↓           ↓            ↓           ↓         ↓           ↓              ↓
Dashboard    SQL Query   Zero-Copy   API Key    Query    Columnar    Compacted    Object
                         Buffers      Auth      Engine    Storage      Files      Storage

Arrow IPC Flow:

Superset sends SQL query via dialect
Arc executes query with DuckDB
Results serialized to Arrow IPC format (columnar, zero-copy)
Dialect deserializes Arrow buffers directly to Python objects
28-75% faster than JSON serialization/deserialization

Connection String Format

arc+arrow://API_KEY@HOST:PORT/DATABASE

Examples:

# Local development
arc+arrow://your-api-key@localhost:8000/default

# Docker Compose
arc+arrow://your-api-key@arc-api:8000/default

# Remote server
arc+arrow://your-api-key@arc.example.com:8000/default

Note: The +arrow suffix indicates this dialect uses Apache Arrow IPC format for data transfer.

Troubleshooting

Connection Issues

Verify your API key is correct: Test with curl -H "Authorization: Bearer YOUR_KEY" http://host:8000/health
Check that Arc API is running: docker logs arc-api or check systemd logs
Ensure network connectivity: Superset and Arc must be on the same network or accessible via hostname

Query Issues

Check Arc API logs: docker logs arc-api to see query execution
Verify table names exist: Use SHOW TABLES in Arc to list available measurements
Use LIMIT clauses: Always limit large queries for performance
Add time filters: Time-series queries should filter by time range

Performance Tips

Always use LIMIT for exploratory queries to avoid loading millions of rows
Add time range filters for time-series data: WHERE timestamp > NOW() - INTERVAL 1 HOUR
Use column names correctly: Arc stores timestamps in timestamp column (not time)
Leverage query cache: Arc caches query results for 60 seconds by default
Use compacted partitions: Arc automatically merges small files for 10-50x faster queries
Optimize aggregations: Use time_bucket() for time-series bucketing instead of DATE_TRUNC when possible
Partition filtering: Include host or other tag filters to reduce data scanned

Package Contents

arc_dialect.py - Custom SQLAlchemy dialect for Arc API
setup.py - PyPI package configuration
MANIFEST.in - Package file manifest
README.md - This documentation
PUBLISHING.md - Publishing guide for PyPI

Security Notes

API keys are stored encrypted in Superset's database
All queries go through Arc's authentication system
Change the default Superset admin password in production
Set a strong SUPERSET_SECRET_KEY in production
Use HTTPS for production deployments

Advanced Features

Schema Support

Arc supports multiple schemas/databases:

-- Query specific database
SELECT * FROM my_database.cpu LIMIT 10;

-- List all databases
SHOW DATABASES;

Advanced DuckDB Features

Arc leverages DuckDB's full SQL capabilities:

-- List all measurements (tables)
SHOW TABLES;

-- Get table schema
DESCRIBE cpu;

-- Count records by measurement
SELECT COUNT(*) as total_records FROM cpu;

-- Complex aggregations with FILTER
SELECT
    host,
    COUNT(*) FILTER (WHERE usage_idle > 90) as idle_count,
    COUNT(*) FILTER (WHERE usage_idle < 50) as busy_count,
    AVG(usage_idle) as avg_idle
FROM cpu
WHERE timestamp > NOW() - INTERVAL 1 HOUR
GROUP BY host;

Development

To modify or extend the dialect:

Clone the repository:

git clone https://github.com/basekick-labs/arc-superset-arrow.git
cd arc-superset-arrow

Edit arc_dialect.py
Install locally for testing:
```
pip install -e .
```
Test with Superset:
```
superset run -h 0.0.0.0 -p 8088
```
Submit a PR if you add improvements!

License

Same license as Arc Core (AGPL-3.0)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.3.3

Oct 22, 2025

1.3.2

Oct 22, 2025

1.3.1

Oct 22, 2025

1.3.0

Oct 21, 2025

1.0.1

Oct 16, 2025

1.0.0

Oct 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arc_superset_arrow-1.3.3.tar.gz (18.4 kB view details)

Uploaded Oct 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

arc_superset_arrow-1.3.3-py3-none-any.whl (11.3 kB view details)

Uploaded Oct 22, 2025 Python 3

File details

Details for the file arc_superset_arrow-1.3.3.tar.gz.

File metadata

Download URL: arc_superset_arrow-1.3.3.tar.gz
Upload date: Oct 22, 2025
Size: 18.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for arc_superset_arrow-1.3.3.tar.gz
Algorithm	Hash digest
SHA256	`15dbb71af7681c6fd06030a5e9dc4bae4417e4c04aae398ad67dc0bc5a1a22b8`
MD5	`2c2ff48a2c77fcfa0eb4b5963fdd1c6a`
BLAKE2b-256	`f2ad32cba283b2d2c4e76f36119a88263ccf6b5d70afb1c31b2398a98a78569e`

See more details on using hashes here.

File details

Details for the file arc_superset_arrow-1.3.3-py3-none-any.whl.

File metadata

Download URL: arc_superset_arrow-1.3.3-py3-none-any.whl
Upload date: Oct 22, 2025
Size: 11.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for arc_superset_arrow-1.3.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a205c9392715193992b00cf8fafb2bdc2620694c7c51bf658aa50a61f7378de2`
MD5	`fc6a3c83bd99a207770a2f95a9ee5b30`
BLAKE2b-256	`2eb92de3828a678fc22fa92ddb61dd37a5722bf7bcb4853c621b2c945421dbdf`

See more details on using hashes here.

arc-superset-arrow 1.3.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Arc + Apache Superset Integration (Arrow Edition)

Features

Why Arrow?

Quick Start

Option 1: Install in Existing Superset

Option 2: Docker Image (Recommended for Production)

Connect to Arc

Getting Your API Key

Multi-Database Support

Database Structure

Using Databases in Superset

Available Commands

Available Tables

Example Queries

Basic Queries

Time-Series Aggregation

Join Queries

Window Functions

Architecture

Connection String Format

Troubleshooting

Connection Issues

Query Issues

Performance Tips

Package Contents

Security Notes

Advanced Features

Schema Support

Advanced DuckDB Features

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes