Anonymization library for python, fork of anonypy
Project description
AnonyPyx
This is a fork of the python library AnonyPy providing data anonymization techniques. AnonyPyx adds further algorithms (see below) and introduces a declarative interface. If you consider migrating from AnonyPy, keep in mind that AnonyPyx is not compatible with its original API.
Features
- partion-based anonymization algorithm Mondrian [1] supporting
- k-anonymity
- l-diversity
- t-closeness
- microclustering based anonymization algorithm MDAV-Generic [2] supporting
- k-anonymity
- interoperability with pandas data frames
- supports both continuous and categorical attributes
- image anonymization via the k-Same family of algorithms
Install
pip install anonypyx
Usage
Disclaimer: AnonyPyX does not shuffle the input data currently. In some applications, records can be re-identified based on the order in which they appear in the anonymized data set when shuffling is not used.
Mondrian:
import anonypyx
import pandas as pd
# Step 1: Prepare data as pandas data frame:
columns = ["age", "sex", "zip code", "diagnosis"]
data = [
[50, "male", "02139", "stroke"],
[33, "female", "10023", "flu"],
[66, "intersex", "20001", "flu"],
[28, "female", "33139", "diarrhea"],
[92, "male", "94130", "cancer"],
[19, "female", "96850", "diabetes"],
]
df = pd.DataFrame(data=data, columns=columns)
for column in ("sex", "zip code", "diagnosis"):
df[column] = df[column].astype("category")
# Step 2: Prepare anonymizer
anonymizer = anonypyx.Anonymizer(df, k=3, l=2, algorithm="Mondrian", feature_columns=["age", "sex", "zip code"], sensitive_column="diagnosis")
# Step 3: Anonymize data (this might take a while for large data sets)
anonymized_records = anonymizer.anonymize()
# Print results:
anonymized_df = pd.DataFrame(anonymized_records)
print(anonymized_df)
Output:
age sex zip code diagnosis count
0 19-33 female 10023,33139,96850 diabetes 1
1 19-33 female 10023,33139,96850 diarrhea 1
2 19-33 female 10023,33139,96850 flu 1
3 50-92 male,intersex 02139,20001,94130 cancer 1
4 50-92 male,intersex 02139,20001,94130 flu 1
5 50-92 male,intersex 02139,20001,94130 stroke 1
MDAV-generic:
# Step 2: Prepare anonymizer
anonymizer = anonypyx.Anonymizer(df, k=3, algorithm="MDAV-generic", feature_columns=["age", "sex", "zip code"], sensitive_column="diagnosis")
k-Same-Eigen:
import anonypyx
import numpy as np
import cv2
from os import listdir
from os.path import isfile, join
# Step 1: Load images into single numpy array
# images are loaded in grayscale
# every image must have the same height and width
path_to_dir = 'directory/containing/images/'
height = 120
width = 128
files = [f for f in listdir(path_to_dir) if isfile(join(path_to_dir, f))]
images = [cv2.imread(join(path_to_dir, f), flags = cv2.IMREAD_GRAYSCALE) for f in listdir(path_to_dir) if isfile(join(path_to_dir, f))]
images = np.array(images)
# Step 2: Prepare anonymizer
anonymizer = anonypyx.kSame(images, width, height, k=5, variant='eigen')
# Step 3: Anonymization
anonymized, mapping = anonymizer.anonymize()
# Display the first image and its anonymized version
sample_image = np.concatenate((images[0], anonymized[mapping[0]]), axis=1).astype('uint8')
sample_image = cv2.cvtColor(sample_image, cv2.COLOR_GRAY2BGR)
cv2.imshow("k-same-eigen", sample_image)
cv2.waitKey(0)
cv2.destroyAllWindows()
Contributing
Clone the repository:
git clone https://github.com/questforwisdom/anonypyx.git
Set a virtual python environment up and install dependencies:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Run tests:
pytest
Changelog
0.2.0
- added the microaggregation algorithm MDAV-generic [2]
- added the Anonymizer class as the new API
- removed Preserver class which was superseded by Anonymizer
0.2.1 - 0.2.3
- minor bugfixes
0.2.4
- added k-Same family of algorithms for image anonymization [3]
- added the microaggregation algorithm used by k-Same
References
- [1]: LeFevre, K., DeWitt, D. J., & Ramakrishnan, R. (2006). Mondrian multidimensional K-anonymity. 22nd International Conference on Data Engineering (ICDE’06), 25–25. https://doi.org/10.1109/ICDE.2006.101
- [2]: Domingo-Ferrer, J., & Torra, V. (2005). Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Mining and Knowledge Discovery, 11, 195–212.
- [3]: E. M. Newton, L. Sweeney, and B. Malin, ‘Preserving privacy by de-identifying face images’, IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 2, pp. 232–243, Feb. 2005, doi: 10.1109/TKDE.2005.32.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file anonypyx-0.2.4.tar.gz
.
File metadata
- Download URL: anonypyx-0.2.4.tar.gz
- Upload date:
- Size: 16.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0478da0df396737f9fd2fb1d400d3166f39de96e8f78274ddee327aff880564e |
|
MD5 | 3b7128550b21a6cbce14ee7a03fb0613 |
|
BLAKE2b-256 | a501e51b8f404655978162b354bb8d2c9f5814015b035f43eb4d243fc72856b1 |
File details
Details for the file anonypyx-0.2.4-py3-none-any.whl
.
File metadata
- Download URL: anonypyx-0.2.4-py3-none-any.whl
- Upload date:
- Size: 13.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1900c2ea9b4b423b4fb54ce4c82d5e23f7426e39071a3659d7a42fb4b637ecf3 |
|
MD5 | 0db1a6ff48e320e8e5892c38a44a94d8 |
|
BLAKE2b-256 | 56be6e44dd41fe850427ea25149b7d85cf3ab30e648ed5679c5f4696865bb912 |