No project description provided
Project description
Tiny Data Warehouse
A lightweight, file-based data warehouse for Python projects. Store and query your data locally using Parquet files with zero configuration.
✨ Features
- Zero Configuration — Works out of the box with sensible defaults
- Parquet Storage — Efficient columnar storage format for fast reads
- Pandas Integration — Seamless integration with pandas DataFrames
- Schema Validation — Optional schema-based tables with type checking
- CLI Support — Command-line interface for quick data inspection
- Backup & Restore — Built-in backup functionality for data safety
- Duplicate Prevention — Optional duplicate detection on specified columns
📦 Installation
pip install tiny-data-warehouse
🚀 Quick Start
Writing Events
from tiny_data_warehouse import DataWarehouse
tdw = DataWarehouse()
tdw.write_event('users', {'name': 'Alice', 'age': 30, 'city': 'Berlin'})
tdw.write_event('users', {'name': 'Bob', 'age': 25, 'city': 'London'})
Reading Events
from tiny_data_warehouse import DataWarehouse
tdw = DataWarehouse()
df = tdw.event('users')
print(df)
# name age city tdw_timestamp tdw_uuid
# 0 Alice 30 Berlin 2024-01-15 10:30:00.123456 550e8400-e29b-41d4-a716-446655440000
# 1 Bob 25 London 2024-01-15 10:30:01.234567 6fa459ea-ee8a-3ca4-894e-db77e160355e
Every event automatically includes:
tdw_timestamp— When the event was recordedtdw_uuid— Unique identifier for each event
📋 Schema-Based Tables
For more structured data, use BaseTable to define schemas with validation:
from tiny_data_warehouse.base_table import BaseTable
class UserTable(BaseTable):
table_name = "users"
schema = {
'name': {'type': 'string'},
'email': {'type': 'string'},
'active': {'type': 'boolean'},
}
# Create and use the table
users = UserTable()
users.add(name="Alice", email="alice@example.com", active=True)
# Read all records
df = users.read()
# Query specific records
active_users = users.load_with_value('active', True)
# Get the most recent entry
latest = users.last()
# Update existing records
users.update(by_key='email', by_value='alice@example.com', new_values={'active': False})
# Delete records
users.delete_by(column='email', value='alice@example.com')
BaseTable API
| Method | Description |
|---|---|
add(**kwargs) |
Add a new record (returns tdw_uuid) |
read(recent_first=False) |
Read all records as DataFrame |
load_with_value(column, value) |
Filter records by column value |
last() |
Get the most recent record |
len() / length() |
Count total records |
is_empty() |
Check if table has no records |
update(by_key, by_value, new_values) |
Update existing record |
update_or_create(by_key, by_value, new_values) |
Update or insert record |
delete_by(column, value) |
Delete records matching criteria |
reset(dry_run=True) |
Clear all records |
replace(df, dry_run=True) |
Replace entire table with DataFrame |
add_column(column_name, default_value) |
Add new column to existing data |
🔧 Configuration
Custom Storage Location
from tiny_data_warehouse import DataWarehouse
# Default: ~/.tinyws/events/
tdw = DataWarehouse(events_folder='/path/to/your/data')
Duplicate Prevention
Prevent duplicate entries based on a specific column:
tdw = DataWarehouse(events_config={
'users': {
'prevent_duplicates_col': 'email'
}
})
tdw.write_event('users', {'email': 'alice@example.com', 'name': 'Alice'})
tdw.write_event('users', {'email': 'alice@example.com', 'name': 'Alice 2'}) # Raises ValueError
Environment Variables
| Variable | Description |
|---|---|
TINYWS_CREATE_EMPTY_WHEN_NOT_FOUND |
Return empty DataFrame instead of raising error for missing events |
💻 Command Line Interface
Tiny Data Warehouse includes a CLI for quick data operations:
# List all stored events
tdw list_stored_events
# Print event data
tdw print_event users
# Create a backup
tdw backup_all
# List available backups
tdw backups_list
# Restore from backup
tdw backup_restore 2024-01-15_10-30-00
🔄 Backup & Restore
tdw = DataWarehouse()
# Create a timestamped backup
tdw.backup_all()
# List available backups
tdw.backups_list()
# Restore from a specific backup
tdw.backup_restore('2024-01-15_10-30-00', dry_run=False)
📁 Data Storage
By default, data is stored in ~/.tinyws/events/ as Parquet files:
~/.tinyws/
└── events/
├── users.parquet
├── orders.parquet
└── products.parquet
🤝 Contributing
Contributions are welcome! Feel free to open issues or submit pull requests.
# Clone the repository
git clone https://github.com/your-username/tiny-data-warehouse.git
cd tiny-data-warehouse
# Install dependencies
poetry install
# Run tests
pytest
📄 License
This project is licensed under the MIT License - see the LICENSE.md file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tiny_data_warehouse-0.3.2.tar.gz.
File metadata
- Download URL: tiny_data_warehouse-0.3.2.tar.gz
- Upload date:
- Size: 6.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28ea8622c378a53449d10d29270fe168bbf8cafe9c9a6f238d2a6e9682604947
|
|
| MD5 |
be375a984c360e42c1f050ef591f4f10
|
|
| BLAKE2b-256 |
68725fac35768f3685e0d1918ba7270c487d5b76aa2f4d92e024f7d8066f8dff
|
File details
Details for the file tiny_data_warehouse-0.3.2-py3-none-any.whl.
File metadata
- Download URL: tiny_data_warehouse-0.3.2-py3-none-any.whl
- Upload date:
- Size: 8.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9230eb0bdf42ea1afd49bd4b7bb37e51eb141c9d03024c764b7b4bcc22e492e4
|
|
| MD5 |
5b8cdf1a10fe777df8bffdf4bad4730c
|
|
| BLAKE2b-256 |
89b2747dc377eb853f129d87b114b7cc19f445ffb965d8a12c688378a55e4c65
|