Skip to main content

A collection of anonymization algorithms in Python

Project description

crowds

crowds is a Python module that provides a suite of anonymization algorithms, allowing to transform Pandas dataframes so that they satisfy k-anonymity or differential privacy. This is a work in progress. So far, one algorithm has been implemented (OLA). Get in touch if you would like to contribute.

Optimal Lattice Anonymization

This is an implementation of the algorithm described by El Emam, Khalet, et al. (2009) [1]. Given a dataframe, an information loss function, and a set of generalization strategies, it returns a k-anonymous version [2], obtained using the single-dimensional global recording model, i.e.: the same values will be mapped consistently to the same generalizations in the new dataset, and the generalization for each dimension will not overlap.

Usage

To define a set of generalization rules:

from crowds.kanonymity.generalizations import GenRule

def first_gen(value):
    return 'value'

def second_gen(value):
    return 'value'

new_rule = GenRule([first_gen, second_gen])
ruleset = {
    'attr_name': new_rule,
}

In order for the algorithm to work correctly, the loss function needs to be monotonic, i.e. non-decreasing for increasing generalization levels. Some information loss functions are provided in information_loss.py. It is also possible to define a custom generalization function (which must have the same signature as the following example):

def loss_fn(node):
    return 0.0

Then, to anonymize:

from crowds.kanonymity import ola
anonymous_df = ola.anonymize(df, k=10, loss=loss_fn, generalizations=gen_rules)

For more, check out this example, using the "Adult" dataset from the UCI Machine Learning Repository [3].

References

[1] El Emam, Khaled, et al. "A globally optimal k-anonymity method for the de-identification of health data." Journal of the American Medical Informatics Association 16.5 (2009): 670-682.

[2] Sweeney, Latanya. "k-anonymity: A model for protecting privacy." International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10.05 (2002): 557-570.

[3] Dua, D. and Graff, C. "UCI Machine Learning Repository." Irvine, CA: University of California, School of Information and Computer Science (2019).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crowds-0.0.1.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crowds-0.0.1-py3-none-any.whl (19.9 kB view details)

Uploaded Python 3

File details

Details for the file crowds-0.0.1.tar.gz.

File metadata

  • Download URL: crowds-0.0.1.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for crowds-0.0.1.tar.gz
Algorithm Hash digest
SHA256 42176217fcd549d9615fbd8de972b4f1fe2a0d785bbf0b2c8a9eb32a83882bbc
MD5 fced5246a58977be4d4b565a9c0fb55c
BLAKE2b-256 785686d05cecc1eddeb3249cfa6e81b4e29b7ed5afe7aa66ac3415af37e74cb6

See more details on using hashes here.

File details

Details for the file crowds-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: crowds-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 19.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for crowds-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a45d23f3468e6037eb3e36afa93c5b01391f0966ea62d3a7112c8c2a8a0ee5fb
MD5 a11fa2c9b4a8729424fe51765e607f43
BLAKE2b-256 50a8441dd5d03058280a3289900b147fb2e92f1a0dbad826557cf351a9b3593d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page