Skip to main content

Modern data platform - PostgreSQL admin + Analytics engine

Project description

Tusk

Modern Data Platform - SQL Client, PostgreSQL Admin, Analytics Engine & Distributed Query Processing

โš ๏ธ Experimental Project: This is an experimental project built for learning and exploration purposes. Use at your own risk in production environments.

๐Ÿค– Built with Claude: This project was developed with significant assistance from Claude (Anthropic's AI assistant), demonstrating the potential of human-AI collaboration in software development.

Features

SQL Client (Studio)

  • Multi-connection support (PostgreSQL, SQLite, DuckDB)
  • Tabbed SQL editor with CodeMirror 6
  • Syntax highlighting and autocomplete
  • Schema browser with FK/PK indicators
  • Query history with persistence
  • Saved queries with folders
  • Results grid with sortable, resizable columns
  • Export to CSV/JSON
  • Keyboard shortcuts (Ctrl+Enter, Ctrl+S, etc.)
  • Light/Dark theme toggle

PostgreSQL Admin

  • Server statistics dashboard with auto-refresh
  • Active queries monitor with kill button
  • Locks monitor with blocking visualization
  • Backup/Restore with pg_dump/pg_restore
  • Table maintenance (VACUUM, ANALYZE, REINDEX)
  • Extensions manager (install/uninstall)
  • Roles & Users management (create, edit, delete roles)
  • Database settings viewer (important pg_settings)

Analytics Engine (DuckDB)

  • DuckDB integration for analytical queries
  • File browser for data files
  • Parquet, CSV, JSON, SQLite support
  • Drag & drop file loading
  • Export results to Parquet
  • Engine selector (PostgreSQL/DuckDB)

Data/ETL (Polars)

  • Visual transform pipeline builder
  • OSM/PBF file support
  • 8 transform types (filter, select, sort, group by, rename, drop nulls, limit, join)
  • Auto-generated Polars code
  • Export to CSV/Parquet
  • Import to DuckDB/PostgreSQL
  • Drag & drop file upload

Geo Integration

  • Auto-detect geometry columns
  • Map visualization with MapLibre GL
  • Points, lines, polygons rendering
  • Feature popups on click
  • Export to GeoJSON

Cluster Mode

  • Distributed query processing
  • Scheduler + Worker architecture
  • Arrow Flight for data transfer
  • DataFusion query execution
  • Real-time cluster dashboard
  • Connect to remote schedulers from UI
  • Start/Stop local cluster from UI (single-node mode)

User Management

  • Single mode (no auth) and Multi-user mode
  • User authentication with sessions
  • 24 permissions across 6 categories
  • Default groups (Administrators, Data Engineers, Analysts, Viewers)
  • User management UI for admins
  • Profile page for users to manage their account
  • Group assignment UI with checkboxes
  • CLI commands for user management

Installation

# Core only
pip install tuskdata

# With PostgreSQL support
pip install tuskdata[postgres]

# With full web UI (recommended)
pip install tuskdata[studio]

# Everything (studio + admin + cluster)
pip install tuskdata[all]

Or install from source:

git clone https://github.com/tuskdata/tuskdata.git
cd tuskdata
pip install -e ".[all]"

Quick Start

Start the Web Studio

tusk studio
# Open http://127.0.0.1:8000

Start with Options

tusk studio --host 0.0.0.0 --port 3000

Start Cluster (Dev Mode)

# Start scheduler + 3 workers
tusk cluster --workers 3

Start Components Separately

# Terminal 1: Scheduler
tusk scheduler --port 8814

# Terminal 2: Worker
tusk worker --scheduler localhost:8814 --port 8815

# Terminal 3: Another Worker
tusk worker --scheduler localhost:8814 --port 8816

CLI Commands

tusk studio [options]     # Start the web studio
tusk config [options]     # Manage configuration
tusk scheduler [options]  # Start the cluster scheduler
tusk worker [options]     # Start a cluster worker
tusk cluster [options]    # Start local cluster (dev mode)
tusk users [subcommand]   # User management
tusk auth [subcommand]    # Authentication management
tusk version              # Show version
tusk help                 # Show help

Studio Options

--host HOST           Host to bind to (default: 127.0.0.1)
--port, -p PORT       Port to bind to (default: 8000)
--pg-bin-path PATH    Path to PostgreSQL binaries

Scheduler Options

--host HOST           Host to bind to (default: 0.0.0.0)
--port, -p PORT       Port to bind to (default: 8814)

Worker Options

--scheduler HOST:PORT Scheduler address (default: localhost:8814)
--host HOST           Host to bind to (default: 0.0.0.0)
--port, -p PORT       Port to bind to (default: 8815)

Cluster Options (Dev Mode)

--workers, -w N       Number of workers (default: 3)

Authentication

Tusk supports two modes:

Single Mode (Default)

No authentication required. All features accessible.

Multi-User Mode

Enable multi-user authentication:

# Enable auth mode
tusk auth enable

# Initialize (create admin user and default groups)
tusk auth init

# Start studio
tusk studio

Default credentials: admin / admin

User Management CLI

tusk users list                     # List all users
tusk users create john --admin      # Create admin user
tusk users create jane              # Create regular user
tusk users reset-password john      # Reset password
tusk users delete john              # Delete user

Configuration

Configuration files are stored in ~/.tusk/:

~/.tusk/
โ”œโ”€โ”€ config.toml       # Global settings
โ”œโ”€โ”€ connections.toml  # Saved connections
โ”œโ”€โ”€ history.db        # Query history (SQLite)
โ”œโ”€โ”€ auth.db           # Users/groups (SQLite, multi-user mode)
โ””โ”€โ”€ backups/          # Database backups

View Configuration

tusk config show

Set Configuration

tusk config set pg_bin_path /usr/local/pgsql/bin
tusk config set port 3000
tusk config set auth_mode multi

Usage

Data Page - Creating Pipelines

  1. Select a Data Source: Click "Select Data File" to browse and select a CSV, Parquet, JSON, or OSM/PBF file. You can also drag & drop files directly.

  2. Add Transforms: Click "Add Transform" to add operations like filter, sort, group by, etc.

  3. Preview Results: See real-time preview of your data with applied transforms.

  4. Export: Export results to CSV, Parquet, or import directly to DuckDB/PostgreSQL.

  5. View Code: Click "View Code" to see the auto-generated Polars code.

Studio Page - Running Queries

  1. Add a Connection: Click "+" in the Connections sidebar to add PostgreSQL, SQLite, or DuckDB connections.

  2. Browse Schema: Expand tables in the Schema panel to see columns, types, and keys.

  3. Write Queries: Use the SQL editor with autocomplete (Ctrl+Space).

  4. Execute: Press Ctrl+Enter or click "Run" to execute queries.

  5. Export: Export results to CSV or JSON.

Admin Page - PostgreSQL Management

  1. Select a Server: Choose a PostgreSQL connection from the sidebar.

  2. Monitor: View real-time stats, active queries, and locks.

  3. Maintain: Run VACUUM, ANALYZE, or REINDEX on tables.

  4. Manage Extensions: Install or uninstall PostgreSQL extensions.

  5. Manage Roles: Create, edit, or delete database roles.

Architecture

Project Structure

src/tusk/
โ”œโ”€โ”€ cli.py           # CLI entry point
โ”œโ”€โ”€ core/            # Core functionality
โ”‚   โ”œโ”€โ”€ config.py    # Global configuration
โ”‚   โ”œโ”€โ”€ connection.py# Connection registry
โ”‚   โ”œโ”€โ”€ auth.py      # Authentication system
โ”‚   โ”œโ”€โ”€ files.py     # File scanning
โ”‚   โ”œโ”€โ”€ geo.py       # GeoJSON/WKT utilities
โ”‚   โ”œโ”€โ”€ history.py   # Query history
โ”‚   โ”œโ”€โ”€ logging.py   # Structlog setup
โ”‚   โ””โ”€โ”€ result.py    # QueryResult dataclass
โ”œโ”€โ”€ engines/         # Query engines
โ”‚   โ”œโ”€โ”€ duckdb_engine.py  # DuckDB
โ”‚   โ”œโ”€โ”€ polars_engine.py  # Polars ETL
โ”‚   โ”œโ”€โ”€ postgres.py       # PostgreSQL
โ”‚   โ””โ”€โ”€ sqlite.py         # SQLite
โ”œโ”€โ”€ admin/           # PostgreSQL admin
โ”‚   โ”œโ”€โ”€ stats.py     # Server stats
โ”‚   โ”œโ”€โ”€ processes.py # Active queries
โ”‚   โ”œโ”€โ”€ backup.py    # Backup/restore
โ”‚   โ”œโ”€โ”€ extensions.py# Extensions
โ”‚   โ”œโ”€โ”€ roles.py     # Role management
โ”‚   โ”œโ”€โ”€ settings.py  # Settings viewer
โ”‚   โ””โ”€โ”€ maintenance.py # Table maintenance
โ”œโ”€โ”€ cluster/         # Distributed processing
โ”‚   โ”œโ”€โ”€ models.py    # Job/Worker models
โ”‚   โ”œโ”€โ”€ scheduler.py # Arrow Flight scheduler
โ”‚   โ””โ”€โ”€ worker.py    # Arrow Flight worker
โ””โ”€โ”€ studio/          # Web UI
    โ”œโ”€โ”€ app.py       # Litestar app
    โ”œโ”€โ”€ routes/      # API endpoints
    โ”œโ”€โ”€ static/      # Static files (JS)
    โ””โ”€โ”€ templates/   # HTML templates

Technologies

  • Web Framework: Litestar + Granian
  • Database: PostgreSQL (psycopg3), SQLite, DuckDB
  • ETL: Polars
  • Distributed: Arrow Flight + DataFusion
  • Frontend: TailwindCSS, CodeMirror 6, MapLibre GL, Lucide Icons
  • Serialization: msgspec
  • Logging: structlog

Dependencies

litestar >= 2.0
granian >= 2.0
psycopg[binary] >= 3.0
msgspec >= 0.18
duckdb >= 1.0
polars >= 1.0
pyarrow >= 17.0
datafusion >= 51.0
structlog >= 24.0
psutil >= 5.9

Development

# Clone the repository
git clone https://github.com/tuskdata/tuskdata.git
cd tuskdata

# Install in development mode with all features
pip install -e ".[all]"

# Run studio
tusk studio

# Or directly
python -m tusk.cli studio

Keyboard Shortcuts

Shortcut Action
Ctrl+Enter Execute query
Ctrl+S Save query
Ctrl+N / Ctrl+T New tab
Ctrl+W Close tab
Ctrl+Space Autocomplete
F5 Refresh schema
Escape Cancel query

Known Limitations

  1. Cluster Mode: Requires scheduler/workers to be running. The "Start Local Cluster" spawns a subprocess which may have permission issues on some systems.

  2. Auth System: In multi-user mode, sessions are stored in SQLite. Server restart does not invalidate sessions.

  3. State Persistence: Data page state uses localStorage which is browser-specific.

  4. Large Files: Performance may degrade with files larger than 500MB. Use appropriate limit settings.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tuskdata-0.1.2-py3-none-any.whl (183.0 kB view details)

Uploaded Python 3

File details

Details for the file tuskdata-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: tuskdata-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 183.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for tuskdata-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 67eb9641abb452150f3e61bd84a8aeb4cbda85a4ad80b3eade80cba19a846334
MD5 cfffdc5ff937fe8a247dab8b13de3b57
BLAKE2b-256 b6eca28cb08518280abc5a7e1b128f4b5bb4a00c892bc5ea481b949c2b6f02c0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page