SQLAlchemy dialect for Arc time-series database with multi-database support for Apache Superset
Project description
Arc + Apache Superset Integration
This directory contains the custom SQLAlchemy dialect and configuration for integrating Apache Superset with the Arc time-series data warehouse.
Features
- ✅ Custom SQLAlchemy dialect for Arc API
- ✅ Authentication via API keys (set and forget)
- ✅ Full SQL support for dashboard creation
- ✅ Auto-discovery of Arc tables and measurements
- ✅ Docker Compose integration
Quick Start
Option 1: Install in Existing Superset
# Install the Arc dialect package
pip install arc-superset-dialect
# Restart Superset
superset run -h 0.0.0.0 -p 8088
Option 2: Docker Image (Recommended for Production)
Use the included Dockerfile to build a Superset image with Arc JSON dialect pre-installed:
# Build the image
docker build -t superset-arc-json:latest .
# Run Superset with Arc JSON support
docker run -d \
-p 8088:8088 \
-v superset_home:/app/superset_home \
--name superset-arc \
superset-arc-json:latest
# Check logs
docker logs -f superset-arc
The Dockerfile includes:
- Arc JSON dialect pre-installed in Superset's venv
- Pip properly configured in the virtual environment
- Uses Superset's default initialization
Connect to Arc
-
Access Superset: http://localhost:8088 (admin/admin)
-
Create Arc Connection:
- Go to Settings → Database Connections
- Click + Database
- Select Other as database type
- Use this connection string:
arc+json://YOUR_API_KEY@arc-api:8000/default
-
Replace
YOUR_API_KEYwith your Arc token (see below)
Getting Your API Key
Arc creates an initial admin token on first startup. Check the logs:
# Docker
docker logs arc-api | grep "Initial admin token"
# Native/systemd
journalctl -u arc | grep "Initial admin token"
Or create a new token using the Arc CLI or Python:
# Using Python directly
DB_PATH="./data/arc.db" python3 -c "
from api.auth import AuthManager
auth = AuthManager(db_path='./data/arc.db')
token = auth.create_token('superset-integration', description='Superset dashboard access')
print(f'Token: {token}')
"
Save the returned token - it's only shown once!
Multi-Database Support
Arc supports multiple databases (namespaces) within a single instance. In Superset, databases are exposed as schemas:
Database Structure
Schema: default
├── cpu (CPU metrics)
├── mem (Memory metrics)
└── disk (Disk metrics)
Schema: production
├── cpu
├── mem
└── disk
Schema: staging
├── cpu
├── mem
└── disk
Using Databases in Superset
-
View All Databases (Schemas):
- When creating a dataset, select the schema (database) from the dropdown
- Each database appears as a separate schema in Superset
-
Query Specific Database:
-- Query default database SELECT * FROM cpu WHERE timestamp > NOW() - INTERVAL 1 HOUR -- Query specific database SELECT * FROM production.cpu WHERE timestamp > NOW() - INTERVAL 1 HOUR -- Cross-database joins SELECT p.timestamp, p.usage_idle as prod_cpu, s.usage_idle as staging_cpu FROM production.cpu p JOIN staging.cpu s ON p.timestamp = s.timestamp WHERE p.timestamp > NOW() - INTERVAL 1 HOUR
Available Commands
-- List all databases
SHOW DATABASES;
-- List all tables in current database
SHOW TABLES;
Available Tables
The dialect auto-discovers all tables using SHOW TABLES. Common examples:
cpu- CPU metricsmem- Memory metricsdisk- Disk metrics- Any custom measurements you've ingested
Example Queries
Basic Queries
-- Recent CPU metrics (use 'timestamp' column)
SELECT timestamp, host, usage_idle, usage_user
FROM cpu
WHERE timestamp > NOW() - INTERVAL 1 HOUR
ORDER BY timestamp DESC
LIMIT 100;
-- Average CPU usage by host
SELECT host, AVG(usage_idle) as avg_idle
FROM cpu
WHERE timestamp > NOW() - INTERVAL 24 HOUR
GROUP BY host
ORDER BY avg_idle DESC;
Time-Series Aggregation
Arc supports DuckDB's powerful time functions:
-- Time bucket aggregation (5-minute intervals)
SELECT
time_bucket(INTERVAL '5 minutes', timestamp) as bucket,
host,
AVG(usage_idle) as avg_idle,
MAX(usage_user) as max_user
FROM cpu
WHERE timestamp > NOW() - INTERVAL 6 HOUR
GROUP BY bucket, host
ORDER BY bucket DESC;
-- Daily aggregation with DATE_TRUNC
SELECT
DATE_TRUNC('day', timestamp) as day,
host,
AVG(usage_idle) as avg_cpu_idle,
COUNT(*) as samples
FROM cpu
WHERE timestamp > NOW() - INTERVAL 7 DAY
GROUP BY day, host
ORDER BY day DESC;
Join Queries
Join multiple measurements for correlated analysis:
-- Correlate CPU and Memory usage
SELECT
c.timestamp,
c.host,
c.usage_idle as cpu_idle,
m.used_percent as mem_used
FROM cpu c
JOIN mem m ON c.timestamp = m.timestamp AND c.host = m.host
WHERE c.timestamp > NOW() - INTERVAL 10 MINUTE
ORDER BY c.timestamp DESC
LIMIT 1000;
Window Functions
-- Moving average over last 6 data points
SELECT
timestamp,
host,
usage_idle,
AVG(usage_idle) OVER (
PARTITION BY host
ORDER BY timestamp
ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
) as moving_avg
FROM cpu
ORDER BY timestamp DESC
LIMIT 100;
Architecture
Superset → Custom Dialect → HTTP Requests → Arc API → DuckDB → Parquet Files → MinIO/S3
↓ ↓ ↓ ↓ ↓ ↓ ↓
Dashboard SQL Query API Key Auth Query Columnar Compacted Object
Engine Storage Files Storage
Connection String Format
arc+json://API_KEY@HOST:PORT/DATABASE
Examples:
# Local development
arc+json://your-api-key@localhost:8000/default
# Docker Compose
arc+json://your-api-key@arc-api:8000/default
# Remote server
arc+json://your-api-key@arc.example.com:8000/default
Troubleshooting
Connection Issues
- Verify your API key is correct: Test with
curl -H "Authorization: Bearer YOUR_KEY" http://host:8000/health - Check that Arc API is running:
docker logs arc-apior check systemd logs - Ensure network connectivity: Superset and Arc must be on the same network or accessible via hostname
Query Issues
- Check Arc API logs:
docker logs arc-apito see query execution - Verify table names exist: Use
SHOW TABLESin Arc to list available measurements - Use
LIMITclauses: Always limit large queries for performance - Add time filters: Time-series queries should filter by time range
Performance Tips
- Always use
LIMITfor exploratory queries to avoid loading millions of rows - Add time range filters for time-series data:
WHERE timestamp > NOW() - INTERVAL 1 HOUR - Use column names correctly: Arc stores timestamps in
timestampcolumn (nottime) - Leverage query cache: Arc caches query results for 60 seconds by default
- Use compacted partitions: Arc automatically merges small files for 10-50x faster queries
- Optimize aggregations: Use
time_bucket()for time-series bucketing instead ofDATE_TRUNCwhen possible - Partition filtering: Include
hostor other tag filters to reduce data scanned
Package Contents
arc_dialect.py- Custom SQLAlchemy dialect for Arc APIsetup.py- PyPI package configurationMANIFEST.in- Package file manifestREADME.md- This documentationPUBLISHING.md- Publishing guide for PyPI
Security Notes
- API keys are stored encrypted in Superset's database
- All queries go through Arc's authentication system
- Change the default Superset admin password in production
- Set a strong
SUPERSET_SECRET_KEYin production - Use HTTPS for production deployments
Advanced Features
Schema Support
Arc supports multiple schemas/databases:
-- Query specific database
SELECT * FROM my_database.cpu LIMIT 10;
-- List all databases
SHOW DATABASES;
Advanced DuckDB Features
Arc leverages DuckDB's full SQL capabilities:
-- List all measurements (tables)
SHOW TABLES;
-- Get table schema
DESCRIBE cpu;
-- Count records by measurement
SELECT COUNT(*) as total_records FROM cpu;
-- Complex aggregations with FILTER
SELECT
host,
COUNT(*) FILTER (WHERE usage_idle > 90) as idle_count,
COUNT(*) FILTER (WHERE usage_idle < 50) as busy_count,
AVG(usage_idle) as avg_idle
FROM cpu
WHERE timestamp > NOW() - INTERVAL 1 HOUR
GROUP BY host;
Development
To modify or extend the dialect:
-
Clone the repository:
git clone https://github.com/basekick-labs/arc-superset-dialect.git cd arc-superset-dialect
-
Edit
arc_dialect.py -
Install locally for testing:
pip install -e .
-
Test with Superset:
superset run -h 0.0.0.0 -p 8088
-
Submit a PR if you add improvements!
License
Same license as Arc Core (AGPL-3.0)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file arc_superset_dialect-1.3.3.tar.gz.
File metadata
- Download URL: arc_superset_dialect-1.3.3.tar.gz
- Upload date:
- Size: 16.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25e257711a79e66c617620953b02d858c5cd7d1632c17647bc709ab159403067
|
|
| MD5 |
fc9eb06cbedbfecae31bf38b34adc297
|
|
| BLAKE2b-256 |
ec6ef6cfe2ac1a7a8c525ee7ed744e9cfb27d813531f361dcb57798d6d12a895
|
File details
Details for the file arc_superset_dialect-1.3.3-py3-none-any.whl.
File metadata
- Download URL: arc_superset_dialect-1.3.3-py3-none-any.whl
- Upload date:
- Size: 10.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74614fe5b260779762cea792e869c0a653b9bfe04b77fb14c2eb4023366dd4d8
|
|
| MD5 |
7a0f9ce371c28a42fc0893deaf47be5e
|
|
| BLAKE2b-256 |
b9d1ee9a3b6ea2010ec6f737db14d497f38685cf5ac873fdb065d37b6eb5a07e
|