A python package for optimal univariate microaggregation in 1d
Project description
microagg1d
A Python library which implements different techniques for optimal univariate microaggregation. The two main parameters that determine the runtime are the length n of the input array and minimal class size k.
Currently the package implements the following methods:
"simple"[O(nk), faster for small k]"wilber"[O(n), faster for larger k] By default, the package switches between the two methods depending on the size of k.
Both methods rely on a prefix sum approach to compute the cluster cost. As the prefix sums tend to become very large quite quickly, a slightly slower but numerically more robust method is chosen by default. If your data is small, or you don't need the numeric stability then you may choose to also opt out of stable.
The code is written in Python and relies on the numba compiler for speed.
Requirements
microagg1d relies on numpy and numba which currently support python 3.8-3.10.
Installation
microagg1d is available on PyPI, the Python Package Index.
$ pip3 install microagg1d
Example Usage
import microagg1d
x = [5, 1, 1, 1.1, 5, 1, 5.1]
k = 3
clusters = microagg1d.optimal_univariate_microaggregation_1d(x, k) # automatically choose method
print(clusters) # [1 0 0 0 1 0 1]
clusters2 = microagg1d.optimal_univariate_microaggregation_1d(x, k=2, method="wilber") # explicitly choose method
print(clusters2) # [1 0 0 0 1 0 1]
# may opt to get increased speed at cost of stability, this is usually not a problem on small datasets like the one used here
# stable works with both wilber and the simple method
clusters3 = microagg1d.optimal_univariate_microaggregation_1d(x, k=2, stable=False)
print(clusters3) # [1 0 0 0 1 0 1]
Important notice: On first usage the the code is compiled once which may take about 30s. On subsequent usages this is no longer necessary and execution is much faster.
Tests
Tests are in tests/.
# Run tests
$ python3 -m pytest .
License
The code in this repository has an BSD 2-Clause "Simplified" License.
See LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file microagg1d-0.2.0.tar.gz.
File metadata
- Download URL: microagg1d-0.2.0.tar.gz
- Upload date:
- Size: 11.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8d73db9240ce85daf02c7512b9d15e6f841383ed12182a8615595bc8d92e815
|
|
| MD5 |
dfcf172227f930c206e29104649ba30b
|
|
| BLAKE2b-256 |
511836a756b91cdf341a6623bf55dbd69876f995dace48478e21e38342fae2c9
|
File details
Details for the file microagg1d-0.2.0-py3-none-any.whl.
File metadata
- Download URL: microagg1d-0.2.0-py3-none-any.whl
- Upload date:
- Size: 12.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2124b2141fe0bc9b7458480fb506169f32a36a21042dcb4d8daab2933d8609ba
|
|
| MD5 |
274ed6e2f4a9192c98a315b47bbafbcc
|
|
| BLAKE2b-256 |
77b81d5a2e4a295b8f6f95ec421ceb1439e5cfa519885c1ede2a1912be6606a4
|