Skip to main content

A lightweight Python library for generating realistic temporary datasets

Project description

TempDataset

PyPI version Python Support License: MIT

A lightweight Python library for generating realistic temporary datasets for testing and development. No heavy dependencies required - works with just the Python standard library!

Features

  • Lightweight: Zero dependencies for core functionality
  • Multiple Formats: Generate CSV, JSON, or in-memory datasets
  • Realistic Data: Built-in datasets with realistic patterns
  • Extensible: Easy to add custom dataset types
  • Memory Efficient: Optimized for large dataset generation
  • Python 3.7+: Compatible with modern Python versions

Quick Start

Installation

pip install tempdataset

For additional features with Faker support:

pip install tempdataset[faker]

Basic Usage

import tempdataset

# Generate 1000 rows of sales data
data = tempdataset.create_dataset('sales', 1000)
print(data.head())

# Save directly to CSV
tempdataset.create_dataset('sales.csv', 500)

# Save directly to JSON
tempdataset.create_dataset('sales.json', 500)

# Read data back
csv_data = tempdataset.read_csv('sales.csv')
json_data = tempdataset.read_json('sales.json')

Available Datasets

Sales Dataset

Generates realistic sales transaction data with:

  • Transaction IDs
  • Customer information
  • Product details
  • Sales amounts and quantities
  • Timestamps
  • Geographic data
# Generate sales data
sales_data = tempdataset.create_dataset('sales', 1000)

# Access data
print(f"Generated {len(sales_data)} rows")
print(f"Columns: {sales_data.columns}")
print(f"Memory usage: {sales_data.memory_usage()}")

Advanced Usage

Working with TempDataFrame

data = tempdataset.create_dataset('sales', 1000)

# Basic operations
print(data.head(10))          # First 10 rows
print(data.tail(5))           # Last 5 rows
print(data.describe())        # Statistical summary
print(data.info())            # Data info

# Filtering and selection
filtered = data.filter(lambda row: row['amount'] > 100)
selected = data.select(['customer_name', 'amount', 'date'])

# Export options
data.to_csv('output.csv')
data.to_json('output.json')
data.to_dict()                # Convert to dictionary

Performance Monitoring

import tempdataset

# Generate data
data = tempdataset.create_dataset('sales', 10000)

# Check performance stats
stats = tempdataset.get_performance_stats()
print(f"Generation time: {stats['generation_time']:.2f}s")
print(f"Memory usage: {stats['memory_usage']:.2f}MB")

# Reset stats for next operation
tempdataset.reset_performance_stats()

Development

Setting up Development Environment

# Clone the repository
git clone https://github.com/dot-css/TempDataset.git
cd TempDataset

# Install development dependencies
pip install -e .[dev]

# Run tests
pytest

# Run tests with coverage
pytest --cov=tempdataset

# Run performance benchmarks
pytest .benchmarks/

Running Tests

# Run all tests
pytest

# Run specific test categories
pytest -m "not slow"          # Skip slow tests
pytest -m integration         # Only integration tests
pytest -m performance         # Only performance tests

# Run with coverage report
pytest --cov=tempdataset --cov-report=html

Code Quality

# Format code
black tempdataset tests

# Lint code
flake8 tempdataset tests

# Type checking
mypy tempdataset

API Reference

Core Functions

create_dataset(dataset_type, rows=500)

Generate temporary datasets or save to files.

Parameters:

  • dataset_type (str): Dataset type ('sales') or filename ('sales.csv', 'sales.json')
  • rows (int): Number of rows to generate (default: 500)

Returns:

  • TempDataFrame containing the generated data (also saves to file if filename provided)

read_csv(filename)

Read CSV file into TempDataFrame.

read_json(filename)

Read JSON file into TempDataFrame.

TempDataFrame Methods

  • head(n=5): Get first n rows
  • tail(n=5): Get last n rows
  • describe(): Statistical summary
  • info(): Dataset information
  • filter(func): Filter rows by function
  • select(columns): Select specific columns
  • to_csv(filename): Export to CSV
  • to_json(filename): Export to JSON
  • to_dict(): Convert to dictionary

Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Workflow

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Run the test suite
  6. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Changelog

See CHANGELOG.md for a detailed history of changes.

Support

Acknowledgments

  • Built with love for the Python testing community
  • Inspired by the need for lightweight, dependency-free test data generation
  • Thanks to all contributors who help make this project better!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tempdataset-0.1.0.tar.gz (35.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tempdataset-0.1.0-py3-none-any.whl (63.3 kB view details)

Uploaded Python 3

File details

Details for the file tempdataset-0.1.0.tar.gz.

File metadata

  • Download URL: tempdataset-0.1.0.tar.gz
  • Upload date:
  • Size: 35.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for tempdataset-0.1.0.tar.gz
Algorithm Hash digest
SHA256 dc01ed85a78283772843be0474fde7390e239f87d7eded97b7175160cad57328
MD5 b68cf1ae75040a87a4a2c8d1c72bf6d3
BLAKE2b-256 dc315103230c1bbf114763793fd506b10b3e4d4f59a641a63734399043fea4dd

See more details on using hashes here.

File details

Details for the file tempdataset-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tempdataset-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 63.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for tempdataset-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a490132f08960285728180117ab988ead3e12252df59c197f5adbe71e4ccef15
MD5 dda05d91b1b5702ebeaee573ae3bac99
BLAKE2b-256 8b8a0d6b08e28be483b06535da55d96cabbf187d7721f394ecf1aa72ab87cfbf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page