Generating hierarchical data
Project description
bj
Generating hierarchical data for machine learning and data science applications
Installation
pip install bj
Description
bj
is a Python package that provides tools for generating synthetic hierarchical cluster data. It extends scikit-learn's make_blobs
function to create nested, hierarchical blob clusters useful for:
- Testing hierarchical clustering algorithms
- Evaluating dimensionality reduction techniques
- Creating visualization examples for nested data structures
- Benchmarking clustering algorithms on hierarchical data
Usage
The main function is make_hblobs
, which generates hierarchical blob clusters with configurable depth and branching.
from bj import make_hblobs
import matplotlib.pyplot as plt
# Generate a simple two-level hierarchy:
# 2 main clusters, each with 3 sub-clusters
X, y = make_hblobs(n_samples=300, centers=[2, 3])
# Plot the result
plt.figure(figsize=(10, 6))
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis', alpha=0.8)
plt.title('Hierarchical Blob Clusters (2x3)')
plt.show()
# Create a more complex three-level hierarchy with varying cluster spreads:
# 2 main clusters, each with 3 sub-clusters, each with 4 sub-sub-clusters
X_complex, y_complex = make_hblobs(
n_samples=500,
centers=[2, 3, 4],
cluster_std=[1.0, 0.5, 0.2] # Decreasing spread at each level
)
Parameters
The make_hblobs
function supports:
n_samples
: Total number of points to generaten_features
: Number of features for each samplecenters
: Integer or list of integers representing the hierarchy structurecluster_std
: Float or list of floats controlling the spread at each levelcenter_box
: Bounding box for cluster centersshuffle
: Whether to shuffle the samplesrandom_state
: For reproducibility
Examples
Simple Two-Level Hierarchy
from bj import make_hblobs
# Create a two-level hierarchy with 2 main clusters, each with 3 sub-clusters
X, y = make_hblobs(n_samples=200, centers=[2, 3])
Complex Multi-Level Hierarchy
# Create a three-level hierarchy with varying cluster spread
X, y = make_hblobs(
n_samples=500,
centers=[2, 3, 4], # 2 main clusters -> 3 sub-clusters -> 4 sub-sub-clusters
cluster_std=[1.0, 0.5, 0.3] # Decreasing spread at each level
)
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
bj-0.0.6.tar.gz
(7.3 kB
view details)
Built Distribution
bj-0.0.6-py3-none-any.whl
(7.7 kB
view details)
File details
Details for the file bj-0.0.6.tar.gz
.
File metadata
- Download URL: bj-0.0.6.tar.gz
- Upload date:
- Size: 7.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
46691f0dddd996be7bf4132f5b10e9c1a3733d0db710a300ed3e34a3d031bb7f
|
|
MD5 |
5cf2e87a618263d098bba852ec29e4df
|
|
BLAKE2b-256 |
570822c8df87ca7c82b023d9f1ffb9c901b5cef6a5e27afd8842ffa63965ca67
|
File details
Details for the file bj-0.0.6-py3-none-any.whl
.
File metadata
- Download URL: bj-0.0.6-py3-none-any.whl
- Upload date:
- Size: 7.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
a5b13ecb4b6d1f2c9174df4d662ba88b0c47c774f26ddf4d5b960ee75bd35c5e
|
|
MD5 |
13fe9497a4e1bcb2e19b8d099ea93cab
|
|
BLAKE2b-256 |
3819c0b63537d59a3cc280862e0f2506411b3df0d9bb24ebb3c3f82a17776ca3
|