Skip to main content

Modern web-based data analysis tool - process CSV/JSON/EXCEL/PARQUET files locally with SQL

Project description

DataKit

Modern web-based data analysis tool for Python users

Process CSV/JSON/XLSX/PARQUET files locally with complete privacy. No data ever leaves your machine.

๐Ÿš€ Quick Start

# Install DataKit
pip install datakit-python

# Start DataKit (opens browser automatically)
datakit

# Or start server without opening browser
datakit serve --no-open

โœจ Features

  • ๐Ÿ”’ Complete Privacy: All data processing happens locally
  • ๐Ÿ“Š Large Files: Process CSV/JSON files up to 4-5GB
  • ๐Ÿš€ Fast Analysis: DuckDB-powered SQL engine via WebAssembly
  • ๐ŸŒ Modern Interface: React-based web UI
  • ๐Ÿ“ˆ Visualizations: Built-in charts and data exploration
  • ๐Ÿ” Advanced Queries: Full SQL support with auto-completion

๐Ÿ› ๏ธ Installation

Requirements

  • Python 3.8 or higher
  • Modern web browser (Chrome, Firefox, Safari, Edge)

Install from PyPI

pip install datakit

๐Ÿ“– Usage

Basic Commands

# Start DataKit (default behavior)
datakit

# Start server only
datakit serve

# Start and open browser explicitly  
datakit open

# Start on custom port
datakit serve --port 8080

# Start on custom host (network accessible)
datakit serve --host 0.0.0.0 --port 3000

# Start without opening browser
datakit serve --no-open

Information Commands

# Show version and features
datakit version

# Show system information
datakit info

# Check for updates
datakit update

Options

Option Description Default
-p, --port Specify port number Auto-detect (3000-3100)
-h, --host Specify host address 127.0.0.1
--no-open Don't open browser automatically Opens browser
--reload Enable auto-reload (development) Disabled

๐Ÿ”ง Advanced Usage

Custom Configuration

from datakit import create_app, find_free_port
import uvicorn

# Create custom app
app = create_app()

# Find available port
port = find_free_port()

# Run with custom settings
uvicorn.run(app, host="0.0.0.0", port=port)

Programmatic Usage

import datakit

# Start server programmatically
datakit.run_server(host="localhost", port=3000)

๐ŸŽฏ Use Cases

Perfect for:

  • Data Scientists: Analyze datasets without cloud dependencies
  • Privacy-Conscious Users: Process sensitive data locally
  • Enterprise Environments: No data leaves your network
  • Large File Analysis: Handle multi-GB files efficiently
  • SQL Analysis: Query your data with full SQL support

๐Ÿ” Security & Privacy

  • Local Processing: All computation happens in your browser
  • No Data Upload: Files never leave your machine
  • No Internet Required: Works offline after installation
  • Enterprise-Safe: Perfect for sensitive data analysis

๐Ÿ“Š Supported File Formats

  • CSV: Comma-separated values with auto-detection
  • JSON: Nested JSON files with flattening support
  • Large Files: Optimized for files up to 4-5GB

๐Ÿค Comparison with Other Tools

Feature DataKit Pandas Excel Cloud Tools
File Size Limit Couple of GBs Memory Limited 1M rows Varies
Privacy Complete Complete Complete Limited
SQL Support Full Limited None Varies
Setup Time 1 command Code required Manual Account setup
Browser Interface โœ… โŒ โŒ โœ…
Offline Use โœ… โœ… โœ… โŒ

๐Ÿ”— Related Packages

  • Node.js: npm install -g datakit-cli
  • Docker: docker run -p 8080:80 datakit/app
  • Homebrew: brew install datakit (coming soon)

๐Ÿš€ Examples

Analyze Sales Data

# Start DataKit
datakit

# Upload your sales.csv file
# Write SQL queries like:
# SELECT product, SUM(revenue) FROM sales GROUP BY product
# Create visualizations with built-in charts

Process Large Datasets

# DataKit handles large files efficiently
datakit serve

# Load multi-GB files with streaming processing
# Query with pagination for smooth performance

๐Ÿ“„ License

MIT License - see LICENSE file for details.

๐Ÿ†˜ Support

๐Ÿ™ Acknowledgments

Built with:

  • FastAPI - Modern Python web framework
  • Click - Command line interface
  • DuckDB - High-performance analytical database
  • React - User interface library

DataKit - Bringing powerful data analysis to your local environment with complete privacy and security.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datakit_local-0.1.0-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file datakit_local-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: datakit_local-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.12

File hashes

Hashes for datakit_local-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f2b5e371da6cb8c084d0e61971d5804d07daaed342d1ab9ae9000cea36d07b58
MD5 717ae89b5d99165746951aa41d162125
BLAKE2b-256 59c39f0fd858eac80e27e2624bc158509b6e0dbb38ef20475186837c276779b8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page