Skip to main content

An online hierarchical binning algorithm

Project description

Bucket Tree

An online hierarchical binning algorithm:

  • Binning: A bucket tree sorts observations into bins, or buckets, according to their value. Each bin has an upper and lower bound, and contains all the observed values that fall between them.

  • Hierarchical: Each bucket can have two child buckets. Together the two child buckets will cover the same range of values that their parent does. Child buckets can have children of their own, making a binary tree structure.

  • Online: A bucket tree handles values one at a time, rather than in a large batch. It starts as a single root bucket and grows to accomodate observed values as they are accumulated. When a bucket collects enough observations, and if a good diving threshold can be found, two child buckets are created for it.

How to use it

Install at the command line.

uv add bucket_tree

or

python3 -m pip install bucket_tree

Put it in a Python script.

from buckettree.bucket_tree import BucketTree

bt = BucketTree()

for _ in range(10_000):
    bin_memberships = bt.bin(np.random.sample())

Run some tests

uv run pytest

API Reference

class BucketTree(max_buckets=100)

Attributes

  • buckets, List[Bucket]: All of the buckets in the tree, in List form.

  • highs, numpy.ndarray(dtype=float): The hi bound of each bucket in the tree.

  • full, bool: True if the maximum number of buckets have been created.

  • leaves, numpy.ndarray(dtype=bool): The leaf status of each bucket in the tree.

  • levels, numpy.ndarray(dtype=int): The level of each bucket in the tree.

  • lows, numpy.ndarray(dtype=float): The lo bound of each bucket in the tree.

  • max_buckets, int: The maximum number of buckets that can be created.

  • n_buckets, int: The total number of buckets that have been created so far.

  • root, Bucket: The bucket at the root of the tree.

Methods

bin(value)

  • value, float: The floating-point value to bin and learn from.

  • Returns numpy.ndarray: Array of floats of length max_buckets. Each element is either 0.0 or 1.0, with 1.0 elements showing buckets in which value belongs. (If a value belongs in a child bucket, it also belongs in the parent bucket, so most values will belong to several buckets.)


class Bucket(bucket_size=100, lo=MIN_VAL, hi=MAX_VAL, leaf=True, level=0)

Attributes

  • bucket_size, int: After a bucket collects this many observations, it starts trying to create child buckets.

  • hi, float: The upper bound of the bucket's range. The range excludes the hi value.

  • i_bucket, int: The index associated with this Bucket.

  • level, int: The number of generations between this Bucket and the root of the tree.

  • leaf, bool: True only if this Bucket is a leaf on the tree.

  • lo, float: The lower bound of the bucket's range. The range includes the lo value.

  • lo_child

  • hi_child

  • split_value

Methods


Constants

MAX_VAL

The maximum value of a float allowed by the system.

MIN_VAL

The minimum value of a float allowed by the system.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bucket_tree-0.1.4.tar.gz (20.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bucket_tree-0.1.4-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file bucket_tree-0.1.4.tar.gz.

File metadata

  • Download URL: bucket_tree-0.1.4.tar.gz
  • Upload date:
  • Size: 20.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.1

File hashes

Hashes for bucket_tree-0.1.4.tar.gz
Algorithm Hash digest
SHA256 2fbd92a51d3ef53c66d7047698ee3bd0ccd79d7c8b9cd6d9e396fc624cbe501a
MD5 4d99702771f30491ae8277763bd7b272
BLAKE2b-256 9c32a61ae7d241d1d55b40b732b28dc492211b01f02326688b498d04be01e6ca

See more details on using hashes here.

File details

Details for the file bucket_tree-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for bucket_tree-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 bd5bb6e7f0c1dfbdcfb2f0940d06144a2b8f02d355f6021e4128559aac72ffce
MD5 9674c149210e02b1ca646628cb05ccf6
BLAKE2b-256 e9e973feb14ab75d487a49a5ab560a171156faf17cbfe64286d6d1e34947b49a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page