A lightweight Python library for generating realistic temporary datasets
Project description
TempDataset
A lightweight Python library for generating realistic temporary datasets for testing and development. No heavy dependencies required - works with just the Python standard library!
Features
- Lightweight: Zero dependencies for core functionality
- Multiple Formats: Generate CSV, JSON, or in-memory datasets
- Realistic Data: Built-in datasets with realistic patterns
- Extensible: Easy to add custom dataset types
- Memory Efficient: Optimized for large dataset generation
- Python 3.7+: Compatible with modern Python versions
Quick Start
Installation
pip install tempdataset
For additional features with Faker support:
pip install tempdataset[faker]
Basic Usage
import tempdataset
# Generate 1000 rows of sales data
data = tempdataset.create_dataset('sales', 1000)
print(data.head())
# Save directly to CSV
tempdataset.create_dataset('sales.csv', 500)
# Save directly to JSON
tempdataset.create_dataset('sales.json', 500)
# Read data back
csv_data = tempdataset.read_csv('sales.csv')
json_data = tempdataset.read_json('sales.json')
Available Datasets
Sales Dataset
Generates realistic sales transaction data with:
- Transaction IDs
- Customer information
- Product details
- Sales amounts and quantities
- Timestamps
- Geographic data
# Generate sales data
sales_data = tempdataset.create_dataset('sales', 1000)
# Access data
print(f"Generated {len(sales_data)} rows")
print(f"Columns: {sales_data.columns}")
print(f"Memory usage: {sales_data.memory_usage()}")
Advanced Usage
Working with TempDataFrame
data = tempdataset.create_dataset('sales', 1000)
# Basic operations
print(data.head(10)) # First 10 rows
print(data.tail(5)) # Last 5 rows
print(data.describe()) # Statistical summary
print(data.info()) # Data info
# Filtering and selection
filtered = data.filter(lambda row: row['amount'] > 100)
selected = data.select(['customer_name', 'amount', 'date'])
# Export options
data.to_csv('output.csv')
data.to_json('output.json')
data.to_dict() # Convert to dictionary
Performance Monitoring
import tempdataset
# Generate data
data = tempdataset.create_dataset('sales', 10000)
# Check performance stats
stats = tempdataset.get_performance_stats()
print(f"Generation time: {stats['generation_time']:.2f}s")
print(f"Memory usage: {stats['memory_usage']:.2f}MB")
# Reset stats for next operation
tempdataset.reset_performance_stats()
Development
Setting up Development Environment
# Clone the repository
git clone https://github.com/dot-css/TempDataset.git
cd TempDataset
# Install development dependencies
pip install -e .[dev]
# Run tests
pytest
# Run tests with coverage
pytest --cov=tempdataset
# Run performance benchmarks
pytest .benchmarks/
Running Tests
# Run all tests
pytest
# Run specific test categories
pytest -m "not slow" # Skip slow tests
pytest -m integration # Only integration tests
pytest -m performance # Only performance tests
# Run with coverage report
pytest --cov=tempdataset --cov-report=html
Code Quality
# Format code
black tempdataset tests
# Lint code
flake8 tempdataset tests
# Type checking
mypy tempdataset
API Reference
Core Functions
create_dataset(dataset_type, rows=500)
Generate temporary datasets or save to files.
Parameters:
dataset_type(str): Dataset type ('sales') or filename ('sales.csv', 'sales.json')rows(int): Number of rows to generate (default: 500)
Returns:
TempDataFramecontaining the generated data (also saves to file if filename provided)
read_csv(filename)
Read CSV file into TempDataFrame.
read_json(filename)
Read JSON file into TempDataFrame.
TempDataFrame Methods
head(n=5): Get first n rowstail(n=5): Get last n rowsdescribe(): Statistical summaryinfo(): Dataset informationfilter(func): Filter rows by functionselect(columns): Select specific columnsto_csv(filename): Export to CSVto_json(filename): Export to JSONto_dict(): Convert to dictionary
Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Workflow
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Run the test suite
- Submit a pull request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Changelog
See CHANGELOG.md for a detailed history of changes.
Support
- Documentation: https://tempdataset.readthedocs.io/
- Issue Tracker: https://github.com/dot-css/TempDataset/issues
- Discussions: https://github.com/dot-css/TempDataset/discussions
Acknowledgments
- Built with love for the Python testing community
- Inspired by the need for lightweight, dependency-free test data generation
- Thanks to all contributors who help make this project better!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tempdataset-0.1.0.tar.gz.
File metadata
- Download URL: tempdataset-0.1.0.tar.gz
- Upload date:
- Size: 35.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc01ed85a78283772843be0474fde7390e239f87d7eded97b7175160cad57328
|
|
| MD5 |
b68cf1ae75040a87a4a2c8d1c72bf6d3
|
|
| BLAKE2b-256 |
dc315103230c1bbf114763793fd506b10b3e4d4f59a641a63734399043fea4dd
|
File details
Details for the file tempdataset-0.1.0-py3-none-any.whl.
File metadata
- Download URL: tempdataset-0.1.0-py3-none-any.whl
- Upload date:
- Size: 63.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a490132f08960285728180117ab988ead3e12252df59c197f5adbe71e4ccef15
|
|
| MD5 |
dda05d91b1b5702ebeaee573ae3bac99
|
|
| BLAKE2b-256 |
8b8a0d6b08e28be483b06535da55d96cabbf187d7721f394ecf1aa72ab87cfbf
|