A collection of anonymization algorithms in Python
Project description
crowds
crowds is a Python module that provides a suite of anonymization algorithms, allowing to transform Pandas dataframes so that they satisfy k-anonymity or differential privacy. This is a work in progress. So far, one algorithm has been implemented (OLA). Get in touch if you would like to contribute.
Optimal Lattice Anonymization
This is an implementation of the algorithm described by El Emam, Khalet, et al. (2009) [1]. Given a dataframe, an information loss function, and a set of generalization strategies, it returns a k-anonymous version [2], obtained using the single-dimensional global recording model, i.e.: the same values will be mapped consistently to the same generalizations in the new dataset, and the generalization for each dimension will not overlap.
Usage
To define a set of generalization rules:
from crowds.kanonymity.generalizations import GenRule
def first_gen(value):
return 'value'
def second_gen(value):
return 'value'
new_rule = GenRule([first_gen, second_gen])
ruleset = {
'attr_name': new_rule,
}
In order for the algorithm to work correctly, the loss function needs to be monotonic, i.e. non-decreasing for increasing generalization levels. Some information loss functions are provided in information_loss.py. It is also possible to define a custom generalization function (which must have the same signature as the following example):
def loss_fn(node):
return 0.0
Then, to anonymize:
from crowds.kanonymity import ola
anonymous_df = ola.anonymize(df, k=10, loss=loss_fn, generalizations=gen_rules)
For more, check out this example, using the "Adult" dataset from the UCI Machine Learning Repository [3].
References
[1] El Emam, Khaled, et al. "A globally optimal k-anonymity method for the de-identification of health data." Journal of the American Medical Informatics Association 16.5 (2009): 670-682.
[2] Sweeney, Latanya. "k-anonymity: A model for protecting privacy." International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10.05 (2002): 557-570.
[3] Dua, D. and Graff, C. "UCI Machine Learning Repository." Irvine, CA: University of California, School of Information and Computer Science (2019).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crowds-0.0.1.tar.gz.
File metadata
- Download URL: crowds-0.0.1.tar.gz
- Upload date:
- Size: 6.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42176217fcd549d9615fbd8de972b4f1fe2a0d785bbf0b2c8a9eb32a83882bbc
|
|
| MD5 |
fced5246a58977be4d4b565a9c0fb55c
|
|
| BLAKE2b-256 |
785686d05cecc1eddeb3249cfa6e81b4e29b7ed5afe7aa66ac3415af37e74cb6
|
File details
Details for the file crowds-0.0.1-py3-none-any.whl.
File metadata
- Download URL: crowds-0.0.1-py3-none-any.whl
- Upload date:
- Size: 19.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a45d23f3468e6037eb3e36afa93c5b01391f0966ea62d3a7112c8c2a8a0ee5fb
|
|
| MD5 |
a11fa2c9b4a8729424fe51765e607f43
|
|
| BLAKE2b-256 |
50a8441dd5d03058280a3289900b147fb2e92f1a0dbad826557cf351a9b3593d
|