Persistent Pandas DataFrame storage and retrieval using a SQL database, HDF5, CSV files, or pickle files.

These details have not been verified by PyPI

Project links

Project description

TrashPandas Logo

TrashPandas: Persistent Pandas DataFrame Storage and Retrieval

What is it?

TrashPandas is a modern Python package that provides persistent Pandas DataFrame storage and retrieval using SQL databases, CSV files, HDF5, or pickle files. Version 1.0.2 brings significant improvements including SQLAlchemy 2.x support, comprehensive type hints, modern Python features, enhanced error handling, and improved CI/CD reliability.

✨ Main Features

Multiple Storage Backends: SQL databases, CSV files, HDF5, and pickle files
Preserve Data Integrity: Maintains indexes and data types during storage/retrieval
Format Conversion: Transfer DataFrames between different storage formats
Modern Python Support: Full type hints, context managers, and iterator protocol
Bulk Operations: Efficient batch processing with store_many(), load_many(), delete_many()
Compression Support: Optional compression for CSV and pickle storage
Comprehensive Error Handling: Custom exception hierarchy with detailed error messages
Schema Validation: Robust validation for SQL schema names and metadata
Cross-Platform Compatibility: Tested on multiple Python versions (3.8-3.12) and operating systems
Comprehensive Testing: 252+ tests with 76% code coverage

🚀 Quick Start

Installation

# Basic installation
pip install trashpandas

# With HDF5 support
pip install trashpandas[hdf5]

# Development dependencies
pip install trashpandas[dev]

Basic Usage

import pandas as pd
import sqlalchemy as sa
import trashpandas as tp

# Create sample data
df = pd.DataFrame({'name': ['Joe', 'Bob', 'John'], 'age': [23, 34, 44]})

# SQL Storage
with tp.SqlStorage('sqlite:///test.db') as storage:
    storage['people'] = df
    loaded_df = storage['people']
    print(f"Stored {len(storage)} tables")

# CSV Storage with compression
csv_storage = tp.CsvStorage('./data', compression='gzip')
csv_storage.store(df, 'people')

# Pickle Storage
pickle_storage = tp.PickleStorage('./pickles', compression='bz2')
pickle_storage.store(df, 'people')

📖 Example Notebooks

Check out these interactive Jupyter notebooks demonstrating TrashPandas features:

Basic Usage - Introduction to CSV, SQL, and Pickle storage
Advanced Features - Compression, bulk operations, and data type preservation
Format Conversion - Converting DataFrames between different storage formats
Query Capabilities - Advanced SQL querying with WHERE clauses and filtering

All notebooks are fully executed with outputs included. Click the links above to view them on GitHub or open them in Jupyter Notebook/Lab.

📚 API Reference

Storage Classes

SqlStorage

# Create SQL storage
storage = tp.SqlStorage('sqlite:///test.db')
# or with existing engine
engine = sa.create_engine('sqlite:///test.db')
storage = tp.SqlStorage(engine)

# Basic operations
storage.store(df, 'table_name')
df = storage.load('table_name')
storage.delete('table_name')

# Dictionary-like interface
storage['table_name'] = df
df = storage['table_name']
del storage['table_name']

# Bulk operations
storage.store_many({'table1': df1, 'table2': df2})
results = storage.load_many(['table1', 'table2'])
storage.delete_many(['table1', 'table2'])

# Context manager
with storage:
    storage['data'] = df

CsvStorage

# Basic CSV storage
storage = tp.CsvStorage('./data')

# With compression
storage = tp.CsvStorage('./data', compression='gzip')

# Operations
storage.store(df, 'table_name')
df = storage.load('table_name')

PickleStorage

# Basic pickle storage
storage = tp.PickleStorage('./pickles')

# With custom extension and compression
storage = tp.PickleStorage('./pickles', file_extension='.pkl', compression='bz2')

# Operations
storage.store(df, 'table_name')
df = storage.load('table_name')

HdfStorage (Optional)

# Requires: pip install trashpandas[hdf5]
storage = tp.HdfStorage('data.h5')
storage.store(df, 'table_name')
df = storage.load('table_name')

Modern Features

Iterator Protocol

storage = tp.SqlStorage('sqlite:///test.db')

# Iterate over table names
for table_name in storage:
    print(f"Table: {table_name}")

# Check if table exists
if 'my_table' in storage:
    df = storage['my_table']

# Get number of tables
print(f"Total tables: {len(storage)}")

Context Managers

# Automatic resource cleanup
with tp.SqlStorage('sqlite:///test.db') as storage:
    storage['data'] = df
    # Connection automatically closed

Bulk Operations

# Store multiple DataFrames efficiently
dataframes = {
    'users': users_df,
    'orders': orders_df,
    'products': products_df
}
storage.store_many(dataframes)

# Load multiple tables
tables = ['users', 'orders', 'products']
results = storage.load_many(tables)

# Delete multiple tables
storage.delete_many(tables)

Compression Support

# CSV with compression
csv_storage = tp.CsvStorage('./data', compression='gzip')

# Pickle with compression
pickle_storage = tp.PickleStorage('./pickles', compression='bz2')

# Supported compression types: 'gzip', 'bz2', 'xz', 'zstd'

Error Handling

from trashpandas.exceptions import TableNotFoundError, MetadataCorruptedError

try:
    df = storage.load('nonexistent_table')
except TableNotFoundError as e:
    print(f"Table not found: {e.table_name}")
except MetadataCorruptedError as e:
    print(f"Metadata corrupted: {e.details}")

🔄 Migration from 0.x to 1.0.2

Breaking Changes

SQLAlchemy 2.x Required: Update your SQLAlchemy version
```
pip install "SQLAlchemy>=2.0.0"
```

Path Parameters: Storage classes now accept pathlib.Path objects

# Old
storage = tp.CsvStorage('/path/to/data')

# New (still works)
storage = tp.CsvStorage('/path/to/data')

# New (recommended)
from pathlib import Path
storage = tp.CsvStorage(Path('/path/to/data'))

Method Signatures: Some internal methods have updated signatures

# Old
storage.store(df, 'table')

# New (backward compatible)
storage.store(df, 'table')
storage.store(df, 'table', schema='my_schema')  # New optional parameter

New Features in 1.0.2

Enhanced Schema Validation: Improved validation for SQL schema names and metadata
Better Error Handling: More specific exception types and detailed error messages
Improved Compatibility: Fixed numpy/PyTables compatibility issues across Python versions
Robust CI/CD: Comprehensive testing across Python 3.8-3.12 with reliable builds
Context Managers: Use with statements for automatic cleanup
Iterator Protocol: Iterate over storage objects
Bulk Operations: Efficient batch processing
Compression: Optional compression for file-based storage

🛠️ Development

Setup Development Environment

git clone https://github.com/eddiethedean/trashpandas.git
cd trashpandas
pip install -e ".[dev]"

Running Tests

# Run all tests with tox (recommended)
tox

# Run specific Python version
tox -e py311
tox -e py312

# Run linting
tox -e lint

# Run type checking
tox -e mypy

# Run with coverage
tox -e coverage

# Or run pytest directly
pytest
pytest --cov=trashpandas
pytest tests/test_sql.py

Code Quality

# Linting with ruff
ruff check src tests

# Type checking with mypy
mypy src

# Format code
ruff format src tests

📋 Requirements

Python 3.8+
pandas >= 1.3.0
SQLAlchemy >= 2.0.0
h5py >= 3.10.0 (optional, for HDF5 support)
tables >= 3.8.0 (optional, for HDF5 support)
numpy >= 1.21.0, < 2.0.0 (for PyTables compatibility)

📝 Recent Changes (v1.0.2)

🐛 Bug Fixes

Fixed schema validation in load_metadata_sql function
Resolved numpy/PyTables compatibility issues across Python versions
Improved error handling for edge cases in metadata operations

🔧 Improvements

Enhanced CI/CD pipeline with comprehensive testing
Added proper dependency version constraints for stability
Improved cross-platform compatibility testing
Better error messages and exception handling

🧪 Testing

All 252 tests passing with 76% code coverage
Comprehensive testing across Python 3.8-3.12
Robust CI/CD with reliable builds on GitHub Actions

🤝 Contributing

Contributions are welcome! Please see our Contributing Guide for details.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

pandas for the excellent DataFrame library
SQLAlchemy for robust database connectivity
h5py for HDF5 support
The Python community for inspiration and feedback

TrashPandas - Making DataFrame persistence simple and reliable! 🐼

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.2

Oct 28, 2025

1.0.1

Oct 28, 2025

1.0.0

Oct 28, 2025

0.0.10

Jan 20, 2023

0.0.9

Jan 20, 2023

0.0.8

Dec 27, 2022

0.0.7

Dec 5, 2022

0.0.6

Dec 5, 2022

0.0.5

Feb 6, 2022

0.0.4

Feb 4, 2022

0.0.3

Feb 2, 2022

0.0.2

Feb 2, 2022

0.0.1

Jul 22, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trashpandas-1.0.2.tar.gz (47.7 kB view details)

Uploaded Oct 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

trashpandas-1.0.2-py3-none-any.whl (34.5 kB view details)

Uploaded Oct 28, 2025 Python 3

File details

Details for the file trashpandas-1.0.2.tar.gz.

File metadata

Download URL: trashpandas-1.0.2.tar.gz
Upload date: Oct 28, 2025
Size: 47.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for trashpandas-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`9370abe61e9d1e2a8a54563abe1ca4376616b6754e35e3dfe439abf611a2f442`
MD5	`348dfafeda4268482e3d049b15ab9e48`
BLAKE2b-256	`652b401cd550be72ce04df28864cc8f4270c8ff78e353a539c6b02cb3b0311f9`

See more details on using hashes here.

File details

Details for the file trashpandas-1.0.2-py3-none-any.whl.

File metadata

Download URL: trashpandas-1.0.2-py3-none-any.whl
Upload date: Oct 28, 2025
Size: 34.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for trashpandas-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`60f3647343a45aa974b6e6620ce7ccf27a99d9d830ffee46e876706630613f5a`
MD5	`8c978deaba6aa3c7883c1e5c811ea738`
BLAKE2b-256	`1995995bf9633fc0f69a17f523fb647470bdf04ecbd5ecd2bec5f97f908b6fbc`

See more details on using hashes here.

trashpandas 1.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TrashPandas: Persistent Pandas DataFrame Storage and Retrieval

What is it?

✨ Main Features

🚀 Quick Start

Installation

Basic Usage

📖 Example Notebooks

📚 API Reference

Storage Classes

SqlStorage

CsvStorage

PickleStorage

HdfStorage (Optional)

Modern Features

Iterator Protocol

Context Managers

Bulk Operations

Compression Support

Error Handling

🔄 Migration from 0.x to 1.0.2

Breaking Changes

New Features in 1.0.2

🛠️ Development

Setup Development Environment

Running Tests

Code Quality

📋 Requirements

📝 Recent Changes (v1.0.2)

🐛 Bug Fixes

🔧 Improvements

🧪 Testing

🤝 Contributing

📄 License

🙏 Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes