Skip to main content

A package to perform optimal univariate microaggregation for various cost functions.

Project description

build Coverage Status arXiv:2401.02381

microagg1d

A Python library which implements different techniques for optimal univariate microaggregation. The two main parameters that determine the runtime are the length n of the input array and minimal class size k. This package offers both O(n) (fast for large k) and O(kn) (fast for small k) algorithms.

The code is written in Python and relies on the numba compiler for speed.

Requirements

microagg1d relies on numpy and numba which currently support python 3.8-3.11.

Installation

microagg1d is available on PyPI, the Python Package Index.

$ pip3 install microagg1d

Example Usage

from microagg1d import univariate_microaggregation

x = [5, 1, 1, 1.1, 5, 1, 5.1]

clusters = univariate_microaggregation(x, k=3)

print(clusters)   # [1 0 0 0 1 0 1]

# explicitly choose method / algorithm
clusters2 = univariate_microaggregation(x, k=3, method="wilber")

print(clusters2)   # [1 0 0 0 1 0 1]

# choose a different cost (sae / sse / roundup / rounddown / maxdist)
clusters3 = univariate_microaggregation(x, k=3, cost="sae")

print(clusters3)   # [1 0 0 0 1 0 1]

Important notice: On first import the the code is compiled once which may take about 30s. On subsequent imports this is no longer necessary and imports are almost instant.

Tests

Tests are in tests/.

# Run tests
$ python3 -m pytest .

Method Details

Currently the package implements the following methods:

  • "simple" [O(nk), faster for small k]
  • "wilber" [O(n), faster for larger k]
  • "galil_park" [O(n), fewer calls to SMAWK]
  • "staggered" [fastest O(n)]

By default, the package switches between the simple and wilber method depending on the size of k.

Both methods rely on a prefix sum approach to compute the cluster cost. As the prefix sums tend to become very large quite quickly, a slightly slower but numerically more robust method is chosen by default. If your data is small, or you don't need the numeric stability then you may choose to also opt out of stable.

License

The code in this repository has an BSD 2-Clause "Simplified" License.

See LICENSE.

Citation

This code was created as part of the research for the following publication. If you use this package please cite:

@article{stamm2024faster,
	title = {Faster optimal univariate microaggregation},
	url = {https://openreview.net/forum?id=s5lEUtyVly},
	journal = {Transactions on Machine Learning Research},
	author = {Stamm, Felix I. and Schaub, Michael T},
	year = {2024},
}

References

  • Hansen, S.L. and Mukherjee, S., 2003. A polynomial algorithm for optimal univariate microaggregation. IEEE Transactions on Knowledge and Data Engineering
  • Wilber, R., 1988. The concave least-weight subsequence problem revisited. Journal of Algorithms, 9(3), pp.418-425.
  • Galil, Z. and Park, K., 1989. A linear-time algorithm for concave one-dimensional dynamic programming.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

microagg1d-0.4.0.tar.gz (586.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

microagg1d-0.4.0-py3-none-any.whl (30.6 kB view details)

Uploaded Python 3

File details

Details for the file microagg1d-0.4.0.tar.gz.

File metadata

  • Download URL: microagg1d-0.4.0.tar.gz
  • Upload date:
  • Size: 586.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.3

File hashes

Hashes for microagg1d-0.4.0.tar.gz
Algorithm Hash digest
SHA256 c0257360bf7d4733209f3de5d2373689095e58540c47041ac2a66c9bf14df43e
MD5 18acc0e6171abaf3545721846e86a73e
BLAKE2b-256 443d863c69321e4a9966e2f808dd5d9d60ea2cd714384973ea18bb19593e6016

See more details on using hashes here.

File details

Details for the file microagg1d-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: microagg1d-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 30.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.3

File hashes

Hashes for microagg1d-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 521a206ecf539401e7e892dc679dfbcf16bb2931aa15a96804c8d1f64a7e7215
MD5 acc7c59c95f273304e11877158498c2d
BLAKE2b-256 1a949a497d8f7367a1bf6a571c12758bb95e58e865f6e7a32dfbf20e2c990ffd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page