Generating hierarchical data
Project description
bj
Generating hierarchical data for machine learning and data science applications
Installation
pip install bj
Description
bj
is a Python package that provides tools for generating synthetic hierarchical cluster data. It extends scikit-learn's make_blobs
function to create nested, hierarchical blob clusters useful for:
- Testing hierarchical clustering algorithms
- Evaluating dimensionality reduction techniques
- Creating visualization examples for nested data structures
- Benchmarking clustering algorithms on hierarchical data
Usage
The main function is make_hblobs
, which generates hierarchical blob clusters with configurable depth and branching.
from bj import make_hblobs
import matplotlib.pyplot as plt
# Generate a simple two-level hierarchy:
# 2 main clusters, each with 3 sub-clusters
X, y = make_hblobs(n_samples=300, centers=[2, 3])
# Plot the result
plt.figure(figsize=(10, 6))
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis', alpha=0.8)
plt.title('Hierarchical Blob Clusters (2x3)')
plt.show()
# Create a more complex three-level hierarchy with varying cluster spreads:
# 2 main clusters, each with 3 sub-clusters, each with 4 sub-sub-clusters
X_complex, y_complex = make_hblobs(
n_samples=500,
centers=[2, 3, 4],
cluster_std=[1.0, 0.5, 0.2] # Decreasing spread at each level
)
Parameters
The make_hblobs
function supports:
n_samples
: Total number of points to generaten_features
: Number of features for each samplecenters
: Integer or list of integers representing the hierarchy structurecluster_std
: Float or list of floats controlling the spread at each levelcenter_box
: Bounding box for cluster centersshuffle
: Whether to shuffle the samplesrandom_state
: For reproducibility
Examples
Simple Two-Level Hierarchy
from bj import make_hblobs
# Create a two-level hierarchy with 2 main clusters, each with 3 sub-clusters
X, y = make_hblobs(n_samples=200, centers=[2, 3])
Complex Multi-Level Hierarchy
# Create a three-level hierarchy with varying cluster spread
X, y = make_hblobs(
n_samples=500,
centers=[2, 3, 4], # 2 main clusters -> 3 sub-clusters -> 4 sub-sub-clusters
cluster_std=[1.0, 0.5, 0.3] # Decreasing spread at each level
)
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
bj-0.0.5.tar.gz
(7.4 kB
view details)
Built Distribution
bj-0.0.5-py3-none-any.whl
(7.7 kB
view details)
File details
Details for the file bj-0.0.5.tar.gz
.
File metadata
- Download URL: bj-0.0.5.tar.gz
- Upload date:
- Size: 7.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ccd9f72c7f4a73bebb73720d266a7417def24602cb72285912aa21a647abb712 |
|
MD5 | 52bfcbc97bf940cb2163432b17eb512d |
|
BLAKE2b-256 | 50ce36dbf7d8c55c42525026210a793b3c61a2ec8911f031f91fa02d66426812 |
File details
Details for the file bj-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: bj-0.0.5-py3-none-any.whl
- Upload date:
- Size: 7.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 43b76223ab510dd1a22da63b859421c30502a4579bc5ed518b6a7094c4ae11ad |
|
MD5 | 3d5ab42e4ec5b4e590036bf513e4740c |
|
BLAKE2b-256 | 1eef49c20341a2f94b82b532a54c76a13b6a9483b2ef46ae6ed434537bdd97d8 |