Skip to main content

No project description provided

Project description

Tiny Data Warehouse

PyPI version Python 3.12+ License: MIT

A lightweight, file-based data warehouse for Python projects. Store and query your data locally using Parquet files with zero configuration.

✨ Features

  • Zero Configuration — Works out of the box with sensible defaults
  • Parquet Storage — Efficient columnar storage format for fast reads
  • Pandas Integration — Seamless integration with pandas DataFrames
  • Schema Validation — Optional schema-based tables with type checking
  • CLI Support — Command-line interface for quick data inspection
  • Backup & Restore — Built-in backup functionality for data safety
  • Duplicate Prevention — Optional duplicate detection on specified columns

📦 Installation

pip install tiny-data-warehouse

🚀 Quick Start

Writing Events

from tiny_data_warehouse import DataWarehouse

tdw = DataWarehouse()
tdw.write_event('users', {'name': 'Alice', 'age': 30, 'city': 'Berlin'})
tdw.write_event('users', {'name': 'Bob', 'age': 25, 'city': 'London'})

Reading Events

from tiny_data_warehouse import DataWarehouse

tdw = DataWarehouse()
df = tdw.event('users')
print(df)
#     name  age    city              tdw_timestamp                            tdw_uuid
# 0  Alice   30  Berlin  2024-01-15 10:30:00.123456  550e8400-e29b-41d4-a716-446655440000
# 1    Bob   25  London  2024-01-15 10:30:01.234567  6fa459ea-ee8a-3ca4-894e-db77e160355e

Every event automatically includes:

  • tdw_timestamp — When the event was recorded
  • tdw_uuid — Unique identifier for each event

📋 Schema-Based Tables

For more structured data, use BaseTable to define schemas with validation:

from tiny_data_warehouse.base_table import BaseTable

class UserTable(BaseTable):
    table_name = "users"
    schema = {
        'name': {'type': 'string'},
        'email': {'type': 'string'},
        'active': {'type': 'boolean'},
    }

# Create and use the table
users = UserTable()
users.add(name="Alice", email="alice@example.com", active=True)

# Read all records
df = users.read()

# Query specific records
active_users = users.load_with_value('active', True)

# Get the most recent entry
latest = users.last()

# Update existing records
users.update(by_key='email', by_value='alice@example.com', new_values={'active': False})

# Delete records
users.delete_by(column='email', value='alice@example.com')

BaseTable API

Method Description
add(**kwargs) Add a new record (returns tdw_uuid)
read(recent_first=False) Read all records as DataFrame
load_with_value(column, value) Filter records by column value
last() Get the most recent record
len() / length() Count total records
is_empty() Check if table has no records
update(by_key, by_value, new_values) Update existing record
update_or_create(by_key, by_value, new_values) Update or insert record
delete_by(column, value) Delete records matching criteria
reset(dry_run=True) Clear all records
replace(df, dry_run=True) Replace entire table with DataFrame
add_column(column_name, default_value) Add new column to existing data

🔧 Configuration

Custom Storage Location

from tiny_data_warehouse import DataWarehouse

# Default: ~/.tinyws/events/
tdw = DataWarehouse(events_folder='/path/to/your/data')

Duplicate Prevention

Prevent duplicate entries based on a specific column:

tdw = DataWarehouse(events_config={
    'users': {
        'prevent_duplicates_col': 'email'
    }
})

tdw.write_event('users', {'email': 'alice@example.com', 'name': 'Alice'})
tdw.write_event('users', {'email': 'alice@example.com', 'name': 'Alice 2'})  # Raises ValueError

Environment Variables

Variable Description
TINYWS_CREATE_EMPTY_WHEN_NOT_FOUND Return empty DataFrame instead of raising error for missing events

💻 Command Line Interface

Tiny Data Warehouse includes a CLI for quick data operations:

# List all stored events
tdw list_stored_events

# Print event data
tdw print_event users

# Create a backup
tdw backup_all

# List available backups
tdw backups_list

# Restore from backup
tdw backup_restore 2024-01-15_10-30-00

🔄 Backup & Restore

tdw = DataWarehouse()

# Create a timestamped backup
tdw.backup_all()

# List available backups
tdw.backups_list()

# Restore from a specific backup
tdw.backup_restore('2024-01-15_10-30-00', dry_run=False)

📁 Data Storage

By default, data is stored in ~/.tinyws/events/ as Parquet files:

~/.tinyws/
└── events/
    ├── users.parquet
    ├── orders.parquet
    └── products.parquet

🤝 Contributing

Contributions are welcome! Feel free to open issues or submit pull requests.

# Clone the repository
git clone https://github.com/your-username/tiny-data-warehouse.git
cd tiny-data-warehouse

# Install dependencies
poetry install

# Run tests
pytest

📄 License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tiny_data_warehouse-0.3.1.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tiny_data_warehouse-0.3.1-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file tiny_data_warehouse-0.3.1.tar.gz.

File metadata

  • Download URL: tiny_data_warehouse-0.3.1.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for tiny_data_warehouse-0.3.1.tar.gz
Algorithm Hash digest
SHA256 8134b24d030faa32cf74ba446707a7c9a1bff4e690aedb1c7afbff7a152f3475
MD5 6a3548970b3f2134f5c021162aa659b5
BLAKE2b-256 73d34f47789227e49d7f690348f89476b6766ec38b96ff994d029f9b44e66dde

See more details on using hashes here.

File details

Details for the file tiny_data_warehouse-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for tiny_data_warehouse-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b1b56787f7208726794f82419a737a782d462c69b5c872731f5ea1771d72dbb7
MD5 4e57cbf57ba1949837ad6ef8ef2b2b26
BLAKE2b-256 888abd2fd43dd53a0788cdb459cae650261d3be1caedbad1386ad70b099cd3cc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page