Skip to main content

Generating hierarchical data

Project description

bj

Generating hierarchical data for machine learning and data science applications

Installation

pip install bj

Description

bj is a Python package that provides tools for generating synthetic hierarchical cluster data. It extends scikit-learn's make_blobs function to create nested, hierarchical blob clusters useful for:

  • Testing hierarchical clustering algorithms
  • Evaluating dimensionality reduction techniques
  • Creating visualization examples for nested data structures
  • Benchmarking clustering algorithms on hierarchical data

Usage

The main function is make_hblobs, which generates hierarchical blob clusters with configurable depth and branching.

from bj import make_hblobs
import matplotlib.pyplot as plt

# Generate a simple two-level hierarchy:
# 2 main clusters, each with 3 sub-clusters
X, y = make_hblobs(n_samples=300, centers=[2, 3])

# Plot the result
plt.figure(figsize=(10, 6))
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis', alpha=0.8)
plt.title('Hierarchical Blob Clusters (2x3)')
plt.show()

# Create a more complex three-level hierarchy with varying cluster spreads:
# 2 main clusters, each with 3 sub-clusters, each with 4 sub-sub-clusters
X_complex, y_complex = make_hblobs(
    n_samples=500, 
    centers=[2, 3, 4],
    cluster_std=[1.0, 0.5, 0.2]  # Decreasing spread at each level
)

Parameters

The make_hblobs function supports:

  • n_samples: Total number of points to generate
  • n_features: Number of features for each sample
  • centers: Integer or list of integers representing the hierarchy structure
  • cluster_std: Float or list of floats controlling the spread at each level
  • center_box: Bounding box for cluster centers
  • shuffle: Whether to shuffle the samples
  • random_state: For reproducibility

Examples

Simple Two-Level Hierarchy

from bj import make_hblobs

# Create a two-level hierarchy with 2 main clusters, each with 3 sub-clusters
X, y = make_hblobs(n_samples=200, centers=[2, 3])

Complex Multi-Level Hierarchy

# Create a three-level hierarchy with varying cluster spread
X, y = make_hblobs(
    n_samples=500,
    centers=[2, 3, 4],  # 2 main clusters -> 3 sub-clusters -> 4 sub-sub-clusters
    cluster_std=[1.0, 0.5, 0.3]  # Decreasing spread at each level
)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bj-0.0.5.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

bj-0.0.5-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file bj-0.0.5.tar.gz.

File metadata

  • Download URL: bj-0.0.5.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for bj-0.0.5.tar.gz
Algorithm Hash digest
SHA256 ccd9f72c7f4a73bebb73720d266a7417def24602cb72285912aa21a647abb712
MD5 52bfcbc97bf940cb2163432b17eb512d
BLAKE2b-256 50ce36dbf7d8c55c42525026210a793b3c61a2ec8911f031f91fa02d66426812

See more details on using hashes here.

File details

Details for the file bj-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: bj-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 7.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for bj-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 43b76223ab510dd1a22da63b859421c30502a4579bc5ed518b6a7094c4ae11ad
MD5 3d5ab42e4ec5b4e590036bf513e4740c
BLAKE2b-256 1eef49c20341a2f94b82b532a54c76a13b6a9483b2ef46ae6ed434537bdd97d8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page