Skip to main content

A collection of anonymization algorithms in Python

Project description

crowds

crowds is a Python module that provides a suite of anonymization algorithms, allowing to transform Pandas dataframes so that they satisfy k-anonymity or differential privacy. This is a work in progress. So far, one algorithm has been implemented (OLA). Get in touch if you would like to contribute.

Optimal Lattice Anonymization

This is an implementation of the algorithm described by El Emam, Khalet, et al. (2009) [1]. Given a dataframe, an information loss function, and a set of generalization strategies, it returns a k-anonymous version [2], obtained using the single-dimensional global recording model, i.e.: the same values will be mapped consistently to the same generalizations in the new dataset, and the generalization for each dimension will not overlap.

Usage

To define a set of generalization rules:

from crowds.kanonymity.generalizations import GenRule

def first_gen(value):
    return 'value'

def second_gen(value):
    return 'value'

new_rule = GenRule([first_gen, second_gen])
ruleset = {
    'attr_name': new_rule,
}

In order for the algorithm to work correctly, the loss function needs to be monotonic, i.e. non-decreasing for increasing generalization levels. Some information loss functions are provided in information_loss.py. It is also possible to define a custom generalization function (which must have the same signature as the following example):

def loss_fn(node):
    return 0.0

Then, to anonymize:

from crowds.kanonymity import ola
anonymous_df = ola.anonymize(df, k=10, loss=loss_fn, generalizations=gen_rules)

For more, check out this example, using the "Adult" dataset from the UCI Machine Learning Repository [3].

References

[1] El Emam, Khaled, et al. "A globally optimal k-anonymity method for the de-identification of health data." Journal of the American Medical Informatics Association 16.5 (2009): 670-682.

[2] Sweeney, Latanya. "k-anonymity: A model for protecting privacy." International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10.05 (2002): 557-570.

[3] Dua, D. and Graff, C. "UCI Machine Learning Repository." Irvine, CA: University of California, School of Information and Computer Science (2019).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crowds-0.0.1.tar.gz (6.4 kB view hashes)

Uploaded Source

Built Distribution

crowds-0.0.1-py3-none-any.whl (19.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page