Skip to main content

An unsupervised feature learner

Project description

Ziptie: An unsupervised feature learning algorithm

Here's the full story of how Ziptie works and why it was created.

Installation

uv add ziptie

or

pip install ziptie

Usage

Import

from ziptie.algo import Ziptie

Initialize

zt = Ziptie(n_inputs)

Update agglomeration energies and create bundles on each iteration

zt.create_new_bundles()
zt.grow_bundles()

Calculate outputs on each iteration

bundle_activities = zt.update_bundles(inputs)

Take care of all of it at once

There is a convenience function that takes care of create_new_bundles, grow_bundles, and update_bundles, if you prefer the shorthand.

bundle_activities = zt.step(inputs)

Choose the number of bundles

zt = Ziptie(n_inputs, n_bundles_max)

Example

Putting it all together in a bare-bones example (also in example.py).

import numpy as np
from ziptie.algo import Ziptie

n_inputs = 10
bundle_limit = 10
zt = Ziptie(n_inputs, bundle_limit)

done = False
while not done:
    inputs = np.random.sample(n_inputs)
    bundle_activities = zt.step(inputs)

    if zt.n_bundles >= bundle_limit:
        done = True

print("Done!")

Feature explanation

One trick Ziptie is good at is interpreting and explaining the features it creates. Any collection of bundle activities can be projected back down to the set of inputs that created it.

inputs = zt.project_bundle_activities(bundle_activities)

To get a picture of a single feature, you can construct a sparse bundle_activities array, with only a single non-zero element for the feature you want to investigate.

Testing

There are few basic tests that you can run with

uv run pytest

Benchmark

It's informative to run Ziptie on your own system with different numbers of input cables to see how long it takes to run, and to see how those per-iteration run times grow as the number of bundles increases.

uv run src/ziptie/benchmark.py
uv run src/ziptie/benchmark_plot.py

Don't be afraid to make changes to benchmark.py and see how it affects the run times.

Tweaking Ziptie's behavior through initialization arguments

There are a handful of constants and hyperparameters that allow you to trade speed for accuracy and to adjust Ziptie's behaviors.

From the code:

def __init__(
        self,
        n_cables=16,
        n_bundles_max=64,
        name='ziptie',
        activity_deadzone=.01,
        threshold=1e3,
        growth_threshold=None,
        growth_check_frequency=None,
        nucleation_check_frequency=None,
):

n_cables (int) has already been introduced. It is the number of cable inputs the Ziptie expects.

n_bundles_max (int) has already been introduced. It is the number of bundles the Ziptie is allowed to create. Anecdotally, when n_cables x n_bundles_max = 10^9, it requires 16GB of memory and one iteration takes 21 ms.

activity_deadzone (float, default of .01) is the threshold below which any cable or bundle activity will be snapped down to zero. This helps maintain sparsity without otherwise changing the behavior much.

threshold (float, default of 1e3) is the agglomeration energy threshold for creating a new bundle from two cables. Increasing this means that bundles will form more slowly, but the are more likely to capture the underlying relationships between cables.

growth_threshold (float, default of None) is the optional argument for setting the cable-bundle agglomeration threshold separately. If None it will use whatever value was supplied as the threshold. Having different thresholds for cable-cable and cable-bundle agglomeration can change whether the Ziptie tends to create more small bundles or fewer larger ones.

nucleation_check_frequency (float, default of None) is roughly how many time steps will pass between checking whether there is a pair of cables that has accumulated enough agglomeration energy to become a new bundle. This check is expensive so performing it less often is a good way to speed up the Ziptie. If None is supplied, this defaults to the agglomeration threshold / 10.

growth_check_frequency (float, default of None) is similar to the nucleation_check_frequency, but for agglomeration of cable-bundle bundles.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ziptie-1.2.1.tar.gz (113.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ziptie-1.2.1-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file ziptie-1.2.1.tar.gz.

File metadata

  • Download URL: ziptie-1.2.1.tar.gz
  • Upload date:
  • Size: 113.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.1

File hashes

Hashes for ziptie-1.2.1.tar.gz
Algorithm Hash digest
SHA256 dbbec0fdb059c0a6c49e37915787af6035559afa854616568cf0377a62049376
MD5 1758aef3df8272719adbeb0b3097a1d1
BLAKE2b-256 88438a05a01310ddb952d6c540626a59f3213da73a9c7ee074ad65afee45cfa4

See more details on using hashes here.

File details

Details for the file ziptie-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: ziptie-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.1

File hashes

Hashes for ziptie-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fb14f29b0a3a34162bc7c6a9f1796158a61c14914418488f964e1b4e07fe9b23
MD5 9a7d0620e7cc85b29659d929cc12cf4a
BLAKE2b-256 4676c1b357d47009c2cce49c82bb4cf9bedfaff494c59ee0683c84765adf7176

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page