Skip to main content

A lightweight Python library for generating realistic temporary datasets

Project description

TempDataset

PyPI version Python Support License: MIT

A lightweight Python library for generating realistic temporary datasets for testing and development. Generate 40+ different dataset types including business, financial, IoT, healthcare, and technology data!

✨ Features

  • 40+ Dataset Types: Business, financial, IoT sensors, healthcare, social media, and technology datasets
  • Zero Dependencies: Works with just Python standard library
  • Multiple Formats: Generate CSV, JSON, or in-memory datasets
  • Realistic Data: Built-in faker integration with realistic patterns
  • Technology Focus: New datasets for DevOps, monitoring, web analytics, and system performance
  • Memory Efficient: Optimized for large dataset generation
  • Python 3.7+: Compatible with modern Python versions

🚀 Quick Start

Installation

pip install tempdataset

Basic Usage

import tempdataset

# Generate datasets
data = tempdataset.create_dataset('sales', 1000)
tech_data = tempdataset.create_dataset('web_analytics', 500)
server_metrics = tempdataset.create_dataset('server_metrics', 200)

# Save directly to files
tempdataset.create_dataset('sales_data.csv', 1000)
tempdataset.create_dataset('api_logs.json', 500)

# Get help and see all available datasets  
tempdataset.list_datasets() # Show all 40+ datasets
tempdataset.help()          # Quick help guide

## 📊 Available Datasets (40 Total)

TempDataset provides **40 comprehensive datasets** across 6 major categories:

### 🏢 Core Business Datasets (10)
- **`sales`** - Sales transactions and orders (27 columns)
- **`customers`** - Customer profiles and demographics (31 columns)  
- **`ecommerce`** - E-commerce transactions and reviews (35+ columns)
- **`employees`** - Employee records and HR data (30+ columns)
- **`marketing`** - Marketing campaigns and performance (32+ columns)
- **`retail`** - Retail store operations (28+ columns)
- **`suppliers`** - Supplier management data (22+ columns)
- **`crm`** - Customer relationship management (30+ columns)
- **`inventory`** - Inventory and warehouse data (25+ columns)
- **`reviews`** - Product and service reviews (15+ columns)

### 💰 Financial Datasets (8)
- **`stocks`** - Stock market trading data (20+ columns)
- **`banking`** - Banking transactions (20+ columns)
- **`cryptocurrency`** - Cryptocurrency trading (20+ columns)
- **`insurance`** - Insurance policies and claims (20+ columns)
- **`loans`** - Loan applications and management (20+ columns)
- **`investments`** - Investment portfolios (20+ columns)
- **`accounting`** - General ledger and accounting (20+ columns)
- **`payments`** - Digital payment processing (25+ columns)

### 🌐 Technology Datasets (8) ⭐ *NEW!*
- **`web_analytics`** - Website analytics and traffic (17 columns)
- **`app_usage`** - Mobile app usage analytics (15 columns)
- **`system_logs`** - System and application logs (11 columns)
- **`api_calls`** - API calls and performance (12 columns)
- **`server_metrics`** - Server performance monitoring (22 columns)
- **`user_sessions`** - User session tracking (20 columns)
- **`error_logs`** - Application error logs (16 columns)
- **`performance`** - Application performance monitoring (21 columns)

### 🏥 Healthcare Datasets (6)
- **`patients`** - Patient medical records (22 columns)
- **`appointments`** - Medical appointments (14 columns)
- **`lab_results`** - Laboratory test results (13 columns)
- **`prescriptions`** - Medication prescriptions (16 columns)
- **`medical_history`** - Patient medical history (11 columns)
- **`clinical_trials`** - Clinical trial data (14 columns)

### �️ IoT Sensor Datasets (6)  
- **`weather`** - Weather sensor monitoring (18 columns)
- **`energy`** - Smart meter energy data (14 columns)
- **`traffic`** - Traffic sensor monitoring (15 columns)
- **`environmental`** - Environmental monitoring (17 columns)
- **`industrial`** - Industrial sensor data (16 columns)
- **`smarthome`** - Smart home IoT devices (16 columns)

### 📱 Social Media Datasets (2)
- **`social_media`** - Social media posts and engagement (16 columns)
- **`user_profiles`** - Social media user profiles (17 columns)

### 🚀 Quick Examples

```python
# Generate different types of datasets
sales = tempdataset.create_dataset('sales', 1000)
tech_logs = tempdataset.create_dataset('system_logs', 500)  
health_data = tempdataset.create_dataset('patients', 200)
crypto = tempdataset.create_dataset('cryptocurrency', 300)

# Get help and list all datasets
tempdataset.list_datasets()  # Show all 40 datasets
tempdataset.help()          # Quick reference guide

Advanced Usage

Working with TempDataFrame

data = tempdataset.create_dataset('sales', 1000)

# Basic operations
data.head(10)          # First 10 rows
data.tail(5)           # Last 5 rows
data.describe()        # Statistical summary
data.info()            # Data info

# Filtering and selection
filtered = data.filter(lambda row: row['amount'] > 100)
selected = data.select(['customer_name', 'amount', 'date'])

# Export options
data.to_csv('output.csv')
data.to_json('output.json')
data.to_dict()                # Convert to dictionary

Performance Monitoring

import tempdataset

# Generate data
data = tempdataset.create_dataset('sales', 10000)

# Check performance stats
stats = tempdataset.get_performance_stats()
print(f"Generation time: {stats['generation_time']:.2f}s")
print(f"Memory usage: {stats['memory_usage']:.2f}MB")

# Reset stats for next operation
tempdataset.reset_performance_stats()

Development

Setting up Development Environment

# Clone the repository
git clone https://github.com/dot-css/TempDataset.git
cd TempDataset

# Install development dependencies
pip install -e .[dev]

# Run tests
pytest

# Run tests with coverage
pytest --cov=tempdataset

# Run performance benchmarks
pytest .benchmarks/

Running Tests

# Run all tests
pytest

# Run specific test categories
pytest -m "not slow"          # Skip slow tests
pytest -m integration         # Only integration tests
pytest -m performance         # Only performance tests

# Run with coverage report
pytest --cov=tempdataset --cov-report=html

Code Quality

# Format code
black tempdataset tests

# Lint code
flake8 tempdataset tests

# Type checking
mypy tempdataset

API Reference

Core Functions

create_dataset(dataset_type, rows=500)

Generate temporary datasets or save to files.

Parameters:

  • dataset_type (str): Dataset type or filename
    • Available types: 'sales', 'customers', 'ecommerce', 'employees', 'marketing', 'retail', 'suppliers'
    • File formats: 'sales.csv', 'customers.json', etc.
  • rows (int): Number of rows to generate (default: 500)

Returns:

  • TempDataFrame containing the generated data (also saves to file if filename provided)

help()

Display comprehensive help information about all available datasets, including column descriptions, usage examples, and feature details.

list_datasets()

Get a quick overview of all available datasets with their key features and column counts.

read_csv(filename)

Read CSV file into TempDataFrame.

read_json(filename)

Read JSON file into TempDataFrame.

TempDataFrame Methods

  • head(n=5): Get first n rows
  • tail(n=5): Get last n rows
  • describe(): Statistical summary
  • info(): Dataset information
  • filter(func): Filter rows by function
  • select(columns): Select specific columns
  • to_csv(filename): Export to CSV
  • to_json(filename): Export to JSON
  • to_dict(): Convert to dictionary

Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Workflow

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Run the test suite
  6. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Changelog

See CHANGELOG.md for a detailed history of changes.

Support

Acknowledgments

  • Built with love for the Python testing community
  • Inspired by the need for lightweight, dependency-free test data generation
  • Thanks to all contributors who help make this project better!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tempdataset-0.1.2.tar.gz (149.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tempdataset-0.1.2-py3-none-any.whl (182.4 kB view details)

Uploaded Python 3

File details

Details for the file tempdataset-0.1.2.tar.gz.

File metadata

  • Download URL: tempdataset-0.1.2.tar.gz
  • Upload date:
  • Size: 149.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for tempdataset-0.1.2.tar.gz
Algorithm Hash digest
SHA256 cac41e5586dd5ed71605c4f36be3aefa4a5173c46d55b8d0494d093f3210b026
MD5 065cc18777bd8359f885a0aa34321dd5
BLAKE2b-256 8a441a64f44d366a62e1825017ae9964b53ac1171efbbf55d2c18a805b77b44e

See more details on using hashes here.

File details

Details for the file tempdataset-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: tempdataset-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 182.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for tempdataset-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 15a253bdf2fe49e3978017936be9a8049f218afd2cba8af4be7a21e0a71db93c
MD5 3522f1ce0708d28550afef042f7f64f7
BLAKE2b-256 2fbd209c899cfcf5748e628f813f84fc93347728c38bdf401309d149e5adf751

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page