Persistent Pandas DataFrame storage and retrieval using a SQL database, HDF5, CSV files, or pickle files.

These details have not been verified by PyPI

Project links

Project description

TrashPandas Logo

TrashPandas: Persistent Pandas DataFrame Storage and Retrieval

What is it?

TrashPandas is a modern Python package that provides persistent Pandas DataFrame storage and retrieval using SQL databases, CSV files, HDF5, or pickle files. Version 1.0.0 brings significant improvements including SQLAlchemy 2.x support, comprehensive type hints, modern Python features, and enhanced error handling.

✨ Main Features

Multiple Storage Backends: SQL databases, CSV files, HDF5, and pickle files
Preserve Data Integrity: Maintains indexes and data types during storage/retrieval
Format Conversion: Transfer DataFrames between different storage formats
Modern Python Support: Full type hints, context managers, and iterator protocol
Bulk Operations: Efficient batch processing with store_many(), load_many(), delete_many()
Compression Support: Optional compression for CSV and pickle storage
Comprehensive Error Handling: Custom exception hierarchy with detailed error messages
SQLAlchemy 2.x: Full support for the latest SQLAlchemy with async capabilities

🚀 Quick Start

Installation

# Basic installation
pip install trashpandas

# With HDF5 support
pip install trashpandas[hdf5]

# Development dependencies
pip install trashpandas[dev]

Basic Usage

import pandas as pd
import sqlalchemy as sa
import trashpandas as tp

# Create sample data
df = pd.DataFrame({'name': ['Joe', 'Bob', 'John'], 'age': [23, 34, 44]})

# SQL Storage
with tp.SqlStorage('sqlite:///test.db') as storage:
    storage['people'] = df
    loaded_df = storage['people']
    print(f"Stored {len(storage)} tables")

# CSV Storage with compression
csv_storage = tp.CsvStorage('./data', compression='gzip')
csv_storage.store(df, 'people')

# Pickle Storage
pickle_storage = tp.PickleStorage('./pickles', compression='bz2')
pickle_storage.store(df, 'people')

📖 Example Notebooks

Check out these interactive Jupyter notebooks demonstrating TrashPandas features:

Basic Usage - Introduction to CSV, SQL, and Pickle storage
Advanced Features - Compression, bulk operations, and data type preservation
Format Conversion - Converting DataFrames between different storage formats
Query Capabilities - Advanced SQL querying with WHERE clauses and filtering

All notebooks are fully executed with outputs included. Click the links above to view them on GitHub or open them in Jupyter Notebook/Lab.

📚 API Reference

Storage Classes

SqlStorage

# Create SQL storage
storage = tp.SqlStorage('sqlite:///test.db')
# or with existing engine
engine = sa.create_engine('sqlite:///test.db')
storage = tp.SqlStorage(engine)

# Basic operations
storage.store(df, 'table_name')
df = storage.load('table_name')
storage.delete('table_name')

# Dictionary-like interface
storage['table_name'] = df
df = storage['table_name']
del storage['table_name']

# Bulk operations
storage.store_many({'table1': df1, 'table2': df2})
results = storage.load_many(['table1', 'table2'])
storage.delete_many(['table1', 'table2'])

# Context manager
with storage:
    storage['data'] = df

CsvStorage

# Basic CSV storage
storage = tp.CsvStorage('./data')

# With compression
storage = tp.CsvStorage('./data', compression='gzip')

# Operations
storage.store(df, 'table_name')
df = storage.load('table_name')

PickleStorage

# Basic pickle storage
storage = tp.PickleStorage('./pickles')

# With custom extension and compression
storage = tp.PickleStorage('./pickles', file_extension='.pkl', compression='bz2')

# Operations
storage.store(df, 'table_name')
df = storage.load('table_name')

HdfStorage (Optional)

# Requires: pip install trashpandas[hdf5]
storage = tp.HdfStorage('data.h5')
storage.store(df, 'table_name')
df = storage.load('table_name')

Modern Features

Iterator Protocol

storage = tp.SqlStorage('sqlite:///test.db')

# Iterate over table names
for table_name in storage:
    print(f"Table: {table_name}")

# Check if table exists
if 'my_table' in storage:
    df = storage['my_table']

# Get number of tables
print(f"Total tables: {len(storage)}")

Context Managers

# Automatic resource cleanup
with tp.SqlStorage('sqlite:///test.db') as storage:
    storage['data'] = df
    # Connection automatically closed

Bulk Operations

# Store multiple DataFrames efficiently
dataframes = {
    'users': users_df,
    'orders': orders_df,
    'products': products_df
}
storage.store_many(dataframes)

# Load multiple tables
tables = ['users', 'orders', 'products']
results = storage.load_many(tables)

# Delete multiple tables
storage.delete_many(tables)

Compression Support

# CSV with compression
csv_storage = tp.CsvStorage('./data', compression='gzip')

# Pickle with compression
pickle_storage = tp.PickleStorage('./pickles', compression='bz2')

# Supported compression types: 'gzip', 'bz2', 'xz', 'zstd'

Error Handling

from trashpandas.exceptions import TableNotFoundError, MetadataCorruptedError

try:
    df = storage.load('nonexistent_table')
except TableNotFoundError as e:
    print(f"Table not found: {e.table_name}")
except MetadataCorruptedError as e:
    print(f"Metadata corrupted: {e.details}")

🔄 Migration from 0.x to 1.0

Breaking Changes

SQLAlchemy 2.x Required: Update your SQLAlchemy version
```
pip install "SQLAlchemy>=2.0.0"
```

Path Parameters: Storage classes now accept pathlib.Path objects

# Old
storage = tp.CsvStorage('/path/to/data')

# New (still works)
storage = tp.CsvStorage('/path/to/data')

# New (recommended)
from pathlib import Path
storage = tp.CsvStorage(Path('/path/to/data'))

Method Signatures: Some internal methods have updated signatures

# Old
storage.store(df, 'table')

# New (backward compatible)
storage.store(df, 'table')
storage.store(df, 'table', schema='my_schema')  # New optional parameter

New Features

Context Managers: Use with statements for automatic cleanup
Iterator Protocol: Iterate over storage objects
Bulk Operations: Efficient batch processing
Compression: Optional compression for file-based storage
Better Error Handling: Comprehensive exception hierarchy

🛠️ Development

Setup Development Environment

git clone https://github.com/eddiethedean/trashpandas.git
cd trashpandas
pip install -e ".[dev]"

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=trashpandas

# Run specific test file
pytest tests/test_sql.py

Code Quality

# Linting with ruff
ruff check src tests

# Type checking with mypy
mypy src

# Format code
ruff format src tests

📋 Requirements

Python 3.8+
pandas >= 1.3.0
SQLAlchemy >= 2.0.0
h5py >= 3.0.0 (optional, for HDF5 support)

🤝 Contributing

Contributions are welcome! Please see our Contributing Guide for details.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

pandas for the excellent DataFrame library
SQLAlchemy for robust database connectivity
h5py for HDF5 support
The Python community for inspiration and feedback

TrashPandas - Making DataFrame persistence simple and reliable! 🐼

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.2

Oct 28, 2025

1.0.1

Oct 28, 2025

This version

1.0.0

Oct 28, 2025

0.0.10

Jan 20, 2023

0.0.9

Jan 20, 2023

0.0.8

Dec 27, 2022

0.0.7

Dec 5, 2022

0.0.6

Dec 5, 2022

0.0.5

Feb 6, 2022

0.0.4

Feb 4, 2022

0.0.3

Feb 2, 2022

0.0.2

Feb 2, 2022

0.0.1

Jul 22, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trashpandas-1.0.0.tar.gz (40.4 kB view details)

Uploaded Oct 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

trashpandas-1.0.0-py3-none-any.whl (33.8 kB view details)

Uploaded Oct 28, 2025 Python 3

File details

Details for the file trashpandas-1.0.0.tar.gz.

File metadata

Download URL: trashpandas-1.0.0.tar.gz
Upload date: Oct 28, 2025
Size: 40.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for trashpandas-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`a49ec3190b35444e329b8348f31b4df273fd3bc96ae13e886df8320663b8b2d9`
MD5	`63ff262f33bf292b3c981e3b539c74da`
BLAKE2b-256	`fc922c6a66ed93f989570cafd5156a2e9b5c14552a35534752353c0d4ae20f24`

See more details on using hashes here.

File details

Details for the file trashpandas-1.0.0-py3-none-any.whl.

File metadata

Download URL: trashpandas-1.0.0-py3-none-any.whl
Upload date: Oct 28, 2025
Size: 33.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for trashpandas-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e70cee5079d0e6706d119b9847955563722bad84a805da7738c31ef7cedfc4a4`
MD5	`4695771f233e22a4907ef38c176f3c1f`
BLAKE2b-256	`74ffae703fb0bf4559a172190fac82f58f7ea1492446870be5a3d747debe8477`

See more details on using hashes here.

trashpandas 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TrashPandas: Persistent Pandas DataFrame Storage and Retrieval

What is it?

✨ Main Features

🚀 Quick Start

Installation

Basic Usage

📖 Example Notebooks

📚 API Reference

Storage Classes

SqlStorage

CsvStorage

PickleStorage

HdfStorage (Optional)

Modern Features

Iterator Protocol

Context Managers

Bulk Operations

Compression Support

Error Handling

🔄 Migration from 0.x to 1.0

Breaking Changes

New Features

🛠️ Development

Setup Development Environment

Running Tests

Code Quality

📋 Requirements

🤝 Contributing

📄 License

🙏 Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes