Skip to main content

Generating hierarchical data

Project description

bj

Generating hierarchical data for machine learning and data science applications

Installation

pip install bj

Description

bj is a Python package that provides tools for generating synthetic hierarchical cluster data. It extends scikit-learn's make_blobs function to create nested, hierarchical blob clusters useful for:

  • Testing hierarchical clustering algorithms
  • Evaluating dimensionality reduction techniques
  • Creating visualization examples for nested data structures
  • Benchmarking clustering algorithms on hierarchical data

Usage

The main function is make_hblobs, which generates hierarchical blob clusters with configurable depth and branching.

from bj import make_hblobs
import matplotlib.pyplot as plt

# Generate a simple two-level hierarchy:
# 2 main clusters, each with 3 sub-clusters
X, y = make_hblobs(n_samples=300, centers=[2, 3])

# Plot the result
plt.figure(figsize=(10, 6))
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis', alpha=0.8)
plt.title('Hierarchical Blob Clusters (2x3)')
plt.show()

# Create a more complex three-level hierarchy with varying cluster spreads:
# 2 main clusters, each with 3 sub-clusters, each with 4 sub-sub-clusters
X_complex, y_complex = make_hblobs(
    n_samples=500, 
    centers=[2, 3, 4],
    cluster_std=[1.0, 0.5, 0.2]  # Decreasing spread at each level
)

Parameters

The make_hblobs function supports:

  • n_samples: Total number of points to generate
  • n_features: Number of features for each sample
  • centers: Integer or list of integers representing the hierarchy structure
  • cluster_std: Float or list of floats controlling the spread at each level
  • center_box: Bounding box for cluster centers
  • shuffle: Whether to shuffle the samples
  • random_state: For reproducibility

Examples

Simple Two-Level Hierarchy

from bj import make_hblobs

# Create a two-level hierarchy with 2 main clusters, each with 3 sub-clusters
X, y = make_hblobs(n_samples=200, centers=[2, 3])

Complex Multi-Level Hierarchy

# Create a three-level hierarchy with varying cluster spread
X, y = make_hblobs(
    n_samples=500,
    centers=[2, 3, 4],  # 2 main clusters -> 3 sub-clusters -> 4 sub-sub-clusters
    cluster_std=[1.0, 0.5, 0.3]  # Decreasing spread at each level
)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bj-0.0.6.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

bj-0.0.6-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file bj-0.0.6.tar.gz.

File metadata

  • Download URL: bj-0.0.6.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for bj-0.0.6.tar.gz
Algorithm Hash digest
SHA256 46691f0dddd996be7bf4132f5b10e9c1a3733d0db710a300ed3e34a3d031bb7f
MD5 5cf2e87a618263d098bba852ec29e4df
BLAKE2b-256 570822c8df87ca7c82b023d9f1ffb9c901b5cef6a5e27afd8842ffa63965ca67

See more details on using hashes here.

File details

Details for the file bj-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: bj-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 7.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for bj-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 a5b13ecb4b6d1f2c9174df4d662ba88b0c47c774f26ddf4d5b960ee75bd35c5e
MD5 13fe9497a4e1bcb2e19b8d099ea93cab
BLAKE2b-256 3819c0b63537d59a3cc280862e0f2506411b3df0d9bb24ebb3c3f82a17776ca3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page