Skip to main content

Modern web-based data analysis tool - process CSV/JSON/EXCEL/PARQUET files locally with SQL

Project description

DataKit

Modern web-based data analysis tool

Process CSV/JSON/XLSX/PARQUET files locally with complete privacy. No data ever leaves your machine.

Quick Start

# Install DataKit
pip install datakit-local

# Start DataKit (opens browser automatically)
datakit

# Or start server without opening browser
datakit serve --no-open

Features

  • Complete Privacy: All data processing happens locally
  • Large Files: Process CSV/JSON files up to 4-5GB
  • Fast Analysis: DuckDB-powered SQL engine via WebAssembly
  • Modern Interface: React-based web UI
  • Visualizations: Built-in charts and data exploration
  • Advanced Queries: Full SQL support with auto-completion

Installation

Requirements

  • Python 3.8 or higher
  • Modern web browser (Chrome, Firefox, Safari, Edge)

Install from PyPI

pip install datakit-local

Usage

Basic Commands

# Start DataKit (default behavior)
datakit

# Start server only
datakit serve

# Start and open browser explicitly  
datakit open

# Start on custom port
datakit serve --port 8080

# Start on custom host (network accessible)
datakit serve --host 0.0.0.0 --port 3000

# Start without opening browser
datakit serve --no-open

Information Commands

# Show version and features
datakit version

# Show system information
datakit info

# Check for updates
datakit update

Options

Option Description Default
-p, --port Specify port number Auto-detect (3000-3100)
-h, --host Specify host address 127.0.0.1
--no-open Don't open browser automatically Opens browser
--reload Enable auto-reload (development) Disabled

🔧 Advanced Usage

Custom Configuration

from datakit import create_app, find_free_port
import uvicorn

# Create custom app
app = create_app()

# Find available port
port = find_free_port()

# Run with custom settings
uvicorn.run(app, host="0.0.0.0", port=port)

Programmatic Usage

import datakit

# Start server programmatically
datakit.run_server(host="localhost", port=3000)

Use Cases

Perfect for:

  • Data Scientists: Analyze datasets without cloud dependencies
  • Privacy-Conscious Users: Process sensitive data locally
  • Enterprise Environments: No data leaves your network
  • Large File Analysis: Handle multi-GB files efficiently
  • SQL Analysis: Query your data with full SQL support

Security & Privacy

  • Local Processing: All computation happens in your browser
  • No Data Upload: Files never leave your machine
  • No Internet Required: Works offline after installation
  • Enterprise-Safe: Perfect for sensitive data analysis

Supported File Formats

  • CSV: Comma-separated values with auto-detection
  • JSON: Nested JSON files with flattening support
  • Large Files: Optimized for files up to 4-5GB

Comparison with Other Tools

Feature DataKit Pandas Excel Cloud Tools
File Size Limit Couple of GBs Memory Limited 1M rows Varies
Privacy Complete Complete Complete Limited
SQL Support Full Limited None Varies
Setup Time 1 command Code required Manual Account setup
Browser Interface
Offline Use

Related Packages

  • Node.js: npm install -g datakit-cli
  • Docker: docker run -p 8080:80 datakit/app
  • Homebrew: brew install datakit (coming soon)

Examples

Analyze Sales Data

# Start DataKit
datakit

# Upload your sales.csv file
# Write SQL queries like:
# SELECT product, SUM(revenue) FROM sales GROUP BY product
# Create visualizations with built-in charts

Process Large Datasets

# DataKit handles large files efficiently
datakit serve

# Load multi-GB files with streaming processing
# Query with pagination for smooth performance

License

AGPL-3.0-only License - see LICENSE file for details.

Support

Acknowledgments

Built with:

  • FastAPI - Modern Python web framework
  • Click - Command line interface
  • DuckDB - High-performance analytical database
  • React - User interface library

DataKit - Bringing powerful data analysis to your local environment with complete privacy and security.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datakit_local-0.1.5.tar.gz (20.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datakit_local-0.1.5-py3-none-any.whl (20.1 kB view details)

Uploaded Python 3

File details

Details for the file datakit_local-0.1.5.tar.gz.

File metadata

  • Download URL: datakit_local-0.1.5.tar.gz
  • Upload date:
  • Size: 20.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for datakit_local-0.1.5.tar.gz
Algorithm Hash digest
SHA256 1b8d2a9770057fd54d804cbae12aba49d444b156764f24028032017b438e9e7b
MD5 9d8999522d964b72e380d833c98bca70
BLAKE2b-256 530cd43c4bba47ef877d89566415cd99f8808df2b4e8e861b543aae70b5885a2

See more details on using hashes here.

File details

Details for the file datakit_local-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: datakit_local-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 20.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for datakit_local-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 5a134ecaaa184447cb17d0f5714d94d368238db69acb8f636e74d7d9491e9a8a
MD5 2729985f810778d6cce57db870f2d7ce
BLAKE2b-256 0801589f817441c2cad8302095893355abad9f24a5042e03c42bd2337aa49ada

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page