Generate synthetic data with clusters.
Project description
██████ ███████ ██████ ██ ██ ██████ ██ ██ ██ ███████ ████████
██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██
██████ █████ ██████ ██ ██ ██ ██ ██ ██ ███████ ██
██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██
██ ██ ███████ ██ ███████ ██ ██████ ███████ ██████ ███████ ██
Description
repliclust is a Python package for generating synthetic datasets with clusters. It allows you to generate many different datasets that are geometrically similar, but without ever touching low-level parameters like cluster centroids or covariance matrices.
Features
- Reproducibly generate clusters with defined geometric characteristics
- Manage cluster overlaps, shapes, and probability distributions through intuitive, high-level controls
- Define custom dataset archetypes to power reproducible and informative benchmarks
Installation
pip install repliclust
Quickstart
from repliclust import Archetype, DataGenerator
# Create archetype for 5 oblong clusters with typical "aspect ratio" of 3
oblong_clusters = Archetype(n_clusters=5, dim=2, n_samples=500,
aspect_ref=3, aspect_maxmin=1.5,
name="oblong")
# Define the data generator
data_generator = DataGenerator(archetype=oblong_clusters)
# Sample data points X and class labels y
X, y, _ = data_generator.synthesize()
User Guide / Documentation
For a full user guide and documentation, visit the project website: https://repliclust.org.
Citation
To reference repliclust in your work, please cite:
@article{Zellinger:2023,
title = {repliclust: Synthetic Data for Cluster Analysis},
author = {Zellinger, Michael J and B{\"u}hlmann, Peter},
journal = {arXiv preprint arXiv:2303.14301},
doi = {10.48550/arXiv.2303.14301},
year = {2023}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
repliclust-0.0.4.tar.gz
(1.8 MB
view hashes)
Built Distribution
repliclust-0.0.4-py3-none-any.whl
(33.9 kB
view hashes)
Close
Hashes for repliclust-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c7e3a613211ce83e95997bce0680617acec71880ad89a7d2ddd69be9fd92555a |
|
MD5 | 2077832104f3aa1976ace3db62a8129f |
|
BLAKE2b-256 | 6387ad3e3c53726533d74264b77a4336b2f379830d0e7c3565005d09c2ceee7b |