Skip to main content

A python package for optimal univariate microaggregation in 1d

Project description

build

microagg1d

A Python library which implements different techniques for optimal univariate microaggregation. The two main parameters that determine the runtime are the length n of the input array and minimal class size k.

Currently the package implements the following methods:

  • "simple" [O(nk), faster for small k]
  • "wilber" [O(n), faster for larger k] By default, the package switches between the two methods depending on the size of k.

Both methods rely on a prefix sum approach to compute the cluster cost. As the prefix sums tend to become very large quite quickly, a slightly slower but numerically more robust method is chosen by default. If your data is small, or you don't need the numeric stability then you may choose to also opt out of stable.

The code is written in Python and relies on the numba compiler for speed.

Requirements

microagg1d relies on numpy and numba which currently support python 3.8-3.10.

Installation

microagg1d is available on PyPI, the Python Package Index.

$ pip3 install microagg1d

Example Usage

import microagg1d

x = [5, 1, 1, 1.1, 5, 1, 5.1]
k = 3

clusters = microagg1d.optimal_univariate_microaggregation_1d(x, k) # automatically choose method

print(clusters)   # [1 0 0 0 1 0 1]

clusters2 = microagg1d.optimal_univariate_microaggregation_1d(x, k=2, method="wilber") # explicitly choose method

print(clusters2)   # [1 0 0 0 1 0 1]

# may opt to get increased speed at cost of stability, this is usually not a problem on small datasets like the one used here
# stable works with both wilber and the simple method
clusters3 = microagg1d.optimal_univariate_microaggregation_1d(x, k=2, stable=False)

print(clusters3)   # [1 0 0 0 1 0 1]

Important notice: On first usage the the code is compiled once which may take about 30s. On subsequent usages this is no longer necessary and execution is much faster.

Tests

Tests are in tests/.

# Run tests
$ python3 -m pytest .

License

The code in this repository has an BSD 2-Clause "Simplified" License.

See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

microagg1d-0.2.0.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

microagg1d-0.2.0-py3-none-any.whl (12.9 kB view details)

Uploaded Python 3

File details

Details for the file microagg1d-0.2.0.tar.gz.

File metadata

  • Download URL: microagg1d-0.2.0.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.2

File hashes

Hashes for microagg1d-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b8d73db9240ce85daf02c7512b9d15e6f841383ed12182a8615595bc8d92e815
MD5 dfcf172227f930c206e29104649ba30b
BLAKE2b-256 511836a756b91cdf341a6623bf55dbd69876f995dace48478e21e38342fae2c9

See more details on using hashes here.

File details

Details for the file microagg1d-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: microagg1d-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 12.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.2

File hashes

Hashes for microagg1d-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2124b2141fe0bc9b7458480fb506169f32a36a21042dcb4d8daab2933d8609ba
MD5 274ed6e2f4a9192c98a315b47bbafbcc
BLAKE2b-256 77b81d5a2e4a295b8f6f95ec421ceb1439e5cfa519885c1ede2a1912be6606a4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page