Skip to main content

Anonymization library for python, fork of anonypy

Project description

AnonyPyx

This is a fork of the python library AnonyPy providing data anonymization techniques. AnonyPyx adds further algorithms (see below) and introduces a declarative interface. If you consider migrating from AnonyPy, keep in mind that AnonyPyx is not compatible with its original API.

Features

  • partion-based anonymization algorithm Mondrian [1] supporting
    • k-anonymity
    • l-diversity
    • t-closeness
  • microclustering based anonymization algorithm MDAV-Generic [2] supporting
    • k-anonymity
  • interoperability with pandas data frames
  • supports both continuous and categorical attributes
  • image anonymization via the k-Same family of algorithms

Install

pip install anonypyx

Usage

Disclaimer: AnonyPyX does not shuffle the input data currently. In some applications, records can be re-identified based on the order in which they appear in the anonymized data set when shuffling is not used.

Mondrian:

import anonypyx
import pandas as pd

# Step 1: Prepare data as pandas data frame:

columns = ["age", "sex", "zip code", "diagnosis"]
data = [
    [50, "male", "02139", "stroke"],
    [33, "female", "10023", "flu"],
    [66, "intersex", "20001", "flu"],
    [28, "female", "33139", "diarrhea"],
    [92, "male", "94130", "cancer"],
    [19, "female", "96850", "diabetes"],
]

df = pd.DataFrame(data=data, columns=columns)

for column in ("sex", "zip code", "diagnosis"):
    df[column] = df[column].astype("category")

# Step 2: Prepare anonymizer

anonymizer = anonypyx.Anonymizer(df, k=3, l=2, algorithm="Mondrian", feature_columns=["age", "sex", "zip code"], sensitive_column="diagnosis")

# Step 3: Anonymize data (this might take a while for large data sets)

anonymized_records = anonymizer.anonymize()

# Print results:

anonymized_df = pd.DataFrame(anonymized_records)
print(anonymized_df)

Output:

     age            sex           zip code diagnosis  count
0  19-33         female  10023,33139,96850  diabetes      1
1  19-33         female  10023,33139,96850  diarrhea      1
2  19-33         female  10023,33139,96850       flu      1
3  50-92  male,intersex  02139,20001,94130    cancer      1
4  50-92  male,intersex  02139,20001,94130       flu      1
5  50-92  male,intersex  02139,20001,94130    stroke      1

MDAV-generic:

# Step 2: Prepare anonymizer
anonymizer = anonypyx.Anonymizer(df, k=3, algorithm="MDAV-generic", feature_columns=["age", "sex", "zip code"], sensitive_column="diagnosis")

k-Same-Eigen:

import anonypyx
import numpy as np
import cv2

from os import listdir
from os.path import isfile, join

# Step 1: Load images into single numpy array

# images are loaded in grayscale
# every image must have the same height and width

path_to_dir = 'directory/containing/images/'
height = 120
width = 128
files = [f for f in listdir(path_to_dir) if isfile(join(path_to_dir, f))]
images = [cv2.imread(join(path_to_dir, f), flags = cv2.IMREAD_GRAYSCALE) for f in listdir(path_to_dir) if isfile(join(path_to_dir, f))]
images = np.array(images)

# Step 2: Prepare anonymizer

anonymizer = anonypyx.kSame(images, width, height, k=5, variant='eigen')

# Step 3: Anonymization

anonymized, mapping = anonymizer.anonymize()

# Display the first image and its anonymized version

sample_image = np.concatenate((images[0], anonymized[mapping[0]]), axis=1).astype('uint8')
sample_image = cv2.cvtColor(sample_image, cv2.COLOR_GRAY2BGR)
cv2.imshow("k-same-eigen", sample_image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Contributing

Clone the repository:

git clone https://github.com/questforwisdom/anonypyx.git

Set a virtual python environment up and install dependencies:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Run tests:

pytest

Changelog

0.2.0

  • added the microaggregation algorithm MDAV-generic [2]
  • added the Anonymizer class as the new API
  • removed Preserver class which was superseded by Anonymizer

0.2.1 - 0.2.3

  • minor bugfixes

0.2.4

  • added k-Same family of algorithms for image anonymization [3]
  • added the microaggregation algorithm used by k-Same

References

  • [1]: LeFevre, K., DeWitt, D. J., & Ramakrishnan, R. (2006). Mondrian multidimensional K-anonymity. 22nd International Conference on Data Engineering (ICDE’06), 25–25. https://doi.org/10.1109/ICDE.2006.101
  • [2]: Domingo-Ferrer, J., & Torra, V. (2005). Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Mining and Knowledge Discovery, 11, 195–212.
  • [3]: E. M. Newton, L. Sweeney, and B. Malin, ‘Preserving privacy by de-identifying face images’, IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 2, pp. 232–243, Feb. 2005, doi: 10.1109/TKDE.2005.32.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anonypyx-0.2.4.tar.gz (16.9 kB view details)

Uploaded Source

Built Distribution

anonypyx-0.2.4-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file anonypyx-0.2.4.tar.gz.

File metadata

  • Download URL: anonypyx-0.2.4.tar.gz
  • Upload date:
  • Size: 16.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for anonypyx-0.2.4.tar.gz
Algorithm Hash digest
SHA256 0478da0df396737f9fd2fb1d400d3166f39de96e8f78274ddee327aff880564e
MD5 3b7128550b21a6cbce14ee7a03fb0613
BLAKE2b-256 a501e51b8f404655978162b354bb8d2c9f5814015b035f43eb4d243fc72856b1

See more details on using hashes here.

File details

Details for the file anonypyx-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: anonypyx-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for anonypyx-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 1900c2ea9b4b423b4fb54ce4c82d5e23f7426e39071a3659d7a42fb4b637ecf3
MD5 0db1a6ff48e320e8e5892c38a44a94d8
BLAKE2b-256 56be6e44dd41fe850427ea25149b7d85cf3ab30e648ed5679c5f4696865bb912

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page