Generating hierarchical data
Project description
bj
Generating hierarchical data for machine learning and data science applications
Installation
pip install bj
Description
bj is a Python package that provides tools for generating synthetic hierarchical cluster data. It extends scikit-learn's make_blobs function to create nested, hierarchical blob clusters useful for:
- Testing hierarchical clustering algorithms
- Evaluating dimensionality reduction techniques
- Creating visualization examples for nested data structures
- Benchmarking clustering algorithms on hierarchical data
Usage
The main function is make_hblobs, which generates hierarchical blob clusters with configurable depth and branching.
from bj import make_hblobs
import matplotlib.pyplot as plt
# Generate a simple two-level hierarchy:
# 2 main clusters, each with 3 sub-clusters
X, y = make_hblobs(n_samples=300, centers=[2, 3])
# Plot the result
plt.figure(figsize=(10, 6))
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis', alpha=0.8)
plt.title('Hierarchical Blob Clusters (2x3)')
plt.show()
# Create a more complex three-level hierarchy with varying cluster spreads:
# 2 main clusters, each with 3 sub-clusters, each with 4 sub-sub-clusters
X_complex, y_complex = make_hblobs(
n_samples=500,
centers=[2, 3, 4],
cluster_std=[1.0, 0.5, 0.2] # Decreasing spread at each level
)
Parameters
The make_hblobs function supports:
n_samples: Total number of points to generaten_features: Number of features for each samplecenters: Integer or list of integers representing the hierarchy structurecluster_std: Float or list of floats controlling the spread at each levelcenter_box: Bounding box for cluster centersshuffle: Whether to shuffle the samplesrandom_state: For reproducibility
Examples
Simple Two-Level Hierarchy
from bj import make_hblobs
# Create a two-level hierarchy with 2 main clusters, each with 3 sub-clusters
X, y = make_hblobs(n_samples=200, centers=[2, 3])
Complex Multi-Level Hierarchy
# Create a three-level hierarchy with varying cluster spread
X, y = make_hblobs(
n_samples=500,
centers=[2, 3, 4], # 2 main clusters -> 3 sub-clusters -> 4 sub-sub-clusters
cluster_std=[1.0, 0.5, 0.3] # Decreasing spread at each level
)
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bj-0.0.6.tar.gz.
File metadata
- Download URL: bj-0.0.6.tar.gz
- Upload date:
- Size: 7.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
46691f0dddd996be7bf4132f5b10e9c1a3733d0db710a300ed3e34a3d031bb7f
|
|
| MD5 |
5cf2e87a618263d098bba852ec29e4df
|
|
| BLAKE2b-256 |
570822c8df87ca7c82b023d9f1ffb9c901b5cef6a5e27afd8842ffa63965ca67
|
File details
Details for the file bj-0.0.6-py3-none-any.whl.
File metadata
- Download URL: bj-0.0.6-py3-none-any.whl
- Upload date:
- Size: 7.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5b13ecb4b6d1f2c9174df4d662ba88b0c47c774f26ddf4d5b960ee75bd35c5e
|
|
| MD5 |
13fe9497a4e1bcb2e19b8d099ea93cab
|
|
| BLAKE2b-256 |
3819c0b63537d59a3cc280862e0f2506411b3df0d9bb24ebb3c3f82a17776ca3
|