Generate synthetic data with clusters.
Project description
██████ ███████ ██████ ██ ██ ██████ ██ ██ ██ ███████ ████████
██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██
██████ █████ ██████ ██ ██ ██ ██ ██ ██ ███████ ██
██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██
██ ██ ███████ ██ ███████ ██ ██████ ███████ ██████ ███████ ██
Description
repliclust is a Python package for generating synthetic datasets with clusters. It allows you to generate many different datasets that are geometrically similar, but without ever touching low-level parameters like cluster centroids or covariance matrices.
Features
- Reproducibly generate clusters with defined geometric characteristics
- Manage cluster overlaps, shapes, and probability distributions through intuitive, high-level controls
- Define custom dataset archetypes to power reproducible and informative benchmarks
Installation
pip install repliclust
Quickstart
from repliclust import Archetype, DataGenerator
# Create archetype for 5 oblong clusters with typical "aspect ratio" of 3
oblong_clusters = Archetype(n_clusters=5, dim=2, n_samples=500,
aspect_ref=3, aspect_maxmin=1.5,
name="oblong")
# Define the data generator
data_generator = DataGenerator(archetype=oblong_clusters)
# Sample data points X and class labels y
X, y, _ = data_generator.synthesize()
User Guide / Documentation
For a full user guide and documentation, visit the project website: https://repliclust.org.
Citation
To reference repliclust in your work, please cite:
@article{Zellinger:2023,
title = {repliclust: Synthetic Data for Cluster Analysis},
author = {Zellinger, Michael J and B{\"u}hlmann, Peter},
journal = {arXiv preprint arXiv:2303.14301},
doi = {10.48550/arXiv.2303.14301},
year = {2023}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
repliclust-0.0.5.tar.gz
(1.8 MB
view hashes)
Built Distribution
repliclust-0.0.5-py3-none-any.whl
(33.9 kB
view hashes)
Close
Hashes for repliclust-0.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1b1471fd0ef76bdbf5be18bb3f5cdc88e8f45570f83e9066cfd8a41ce1127d4b |
|
MD5 | fa3bf1e98664b13312cf0f274de32daa |
|
BLAKE2b-256 | 3959d6e77a1375e4b8c46e62a5217f7c7d351869cb2f262931796a286eecd402 |