An online hierarchical binning algorithm
Project description
Bucket Tree
An online hierarchical binning algorithm:
-
Binning: A bucket tree sorts observations into bins, or buckets, according to their value. Each bin has an upper and lower bound, and contains all the observed values that fall between them.
-
Hierarchical: Each bucket can have two child buckets. Together the two child buckets will cover the same range of values that their parent does. Child buckets can have children of their own, making a binary tree structure.
-
Online: A bucket tree handles values one at a time, rather than in a large batch. It starts as a single root bucket and grows to accomodate observed values as they are accumulated. When a bucket collects enough observations, and if a good diving threshold can be found, two child buckets are created for it.
How to use it
Install at the command line.
uv add bucket_tree
or
python3 -m pip install bucket_tree
Put it in a Python script.
from buckettree.bucket_tree import BucketTree
bt = BucketTree()
for _ in range(10_000):
bin_memberships = bt.bin(np.random.sample())
Run some tests
uv run pytest
API Reference
class BucketTree(max_buckets=100)
Attributes
-
buckets,List[Bucket]: All of the buckets in the tree, in List form. -
highs,numpy.ndarray(dtype=float): Thehibound of each bucket in the tree. -
full,bool: True if the maximum number of buckets have been created. -
leaves,numpy.ndarray(dtype=bool): The leaf status of each bucket in the tree. -
levels,numpy.ndarray(dtype=int): The level of each bucket in the tree. -
lows,numpy.ndarray(dtype=float): Thelobound of each bucket in the tree. -
max_buckets,int: The maximum number of buckets that can be created. -
n_buckets,int: The total number of buckets that have been created so far. -
root,Bucket: The bucket at the root of the tree.
Methods
bin(value)
-
value,float: The floating-point value to bin and learn from. -
Returns
numpy.ndarray: Array of floats of lengthmax_buckets. Each element is either 0.0 or 1.0, with 1.0 elements showing buckets in whichvaluebelongs. (If a value belongs in a child bucket, it also belongs in the parent bucket, so most values will belong to several buckets.)
class Bucket(bucket_size=100, lo=MIN_VAL, hi=MAX_VAL, leaf=True, level=0)
Attributes
-
bucket_size,int: After a bucket collects this many observations, it starts trying to create child buckets. -
hi,float: The upper bound of the bucket's range. The range excludes thehivalue. -
i_bucket,int: The index associated with this Bucket. -
level,int: The number of generations between this Bucket and the root of the tree. -
leaf,bool: True only if this Bucket is a leaf on the tree. -
lo,float: The lower bound of the bucket's range. The range includes thelovalue. -
lo_child -
hi_child -
split_value
Methods
Constants
MAX_VAL
The maximum value of a float allowed by the system.
MIN_VAL
The minimum value of a float allowed by the system.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bucket_tree-0.1.4.tar.gz.
File metadata
- Download URL: bucket_tree-0.1.4.tar.gz
- Upload date:
- Size: 20.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2fbd92a51d3ef53c66d7047698ee3bd0ccd79d7c8b9cd6d9e396fc624cbe501a
|
|
| MD5 |
4d99702771f30491ae8277763bd7b272
|
|
| BLAKE2b-256 |
9c32a61ae7d241d1d55b40b732b28dc492211b01f02326688b498d04be01e6ca
|
File details
Details for the file bucket_tree-0.1.4-py3-none-any.whl.
File metadata
- Download URL: bucket_tree-0.1.4-py3-none-any.whl
- Upload date:
- Size: 11.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd5bb6e7f0c1dfbdcfb2f0940d06144a2b8f02d355f6021e4128559aac72ffce
|
|
| MD5 |
9674c149210e02b1ca646628cb05ccf6
|
|
| BLAKE2b-256 |
e9e973feb14ab75d487a49a5ab560a171156faf17cbfe64286d6d1e34947b49a
|