Anonymisation library for python, fork of anonypy
Project description
AnonyPyx
This is a fork of the python library AnonyPy providing data anonymisation techniques. AnonyPyx adds further algorithms (see below), introduces a declarative interface and adds attacks on the anonymised data. If you consider migrating from AnonyPy, keep in mind that AnonyPyx is not compatible with its original API.
Features
- partion-based anonymisation algorithm Mondrian [1] supporting
- k-anonymity
- l-diversity
- t-closeness
- microclustering based anonymisation algorithm MDAV-Generic [2] supporting
- k-anonymity
- interoperability with pandas data frames
- supports both continuous and categorical attributes
- image anonymisation via the k-Same family of algorithms [3]
- attacks on anonymised data sets:
- intersection attack [4]
- privacy metrics:
- percentage of vulnerable population [4]
- utility metrics:
- error of aggregate queries [5]
- discernibility penalty [6]
Install
pip install anonypyx
Usage
Disclaimer: AnonyPyX does not shuffle the input data currently. In some applications, records can be re-identified based on the order in which they appear in the anonymised data set when shuffling is not used.
Mondrian:
import anonypyx
import pandas as pd
# Step 1: Prepare data as pandas data frame:
columns = ["age", "sex", "zip code", "diagnosis"]
data = [
[50, "male", "02139", "stroke"],
[33, "female", "10023", "flu"],
[66, "intersex", "20001", "flu"],
[28, "female", "33139", "diarrhea"],
[92, "male", "94130", "cancer"],
[19, "female", "96850", "diabetes"],
]
df = pd.DataFrame(data=data, columns=columns)
for column in ("sex", "zip code", "diagnosis"):
df[column] = df[column].astype("category")
# Step 2: Prepare anonymiser
anonymiser = anonypyx.Anonymiser(
df, k=3, l=2, algorithm="Mondrian",
feature_columns=["age", "sex", "zip code"],
sensitive_column="diagnosis",
generalisation_strategy="human-readable"
)
# Step 3: Anonymise data (this might take a while for large data sets)
anonymised_records = anonymiser.anonymise()
# Print results:
anonymised_df = anonymised_records
print(anonymised_df)
Output:
age sex zip code diagnosis count
0 19-33 female 10023,33139,96850 diabetes 1
1 19-33 female 10023,33139,96850 diarrhea 1
2 19-33 female 10023,33139,96850 flu 1
3 50-92 male,intersex 02139,20001,94130 cancer 1
4 50-92 male,intersex 02139,20001,94130 flu 1
5 50-92 male,intersex 02139,20001,94130 stroke 1
Contributing
Clone the repository:
git clone https://github.com/questforwisdom/anonypyx.git
Set a virtual python environment up and install dependencies:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Run tests:
pytest
Changelog
0.2.0
- added the microaggregation algorithm MDAV-generic [2]
- added the Anonymizer class as the new API
- removed Preserver class which was superseded by Anonymizer
0.2.1 - 0.2.3
- minor bugfixes
0.2.4
- added k-Same family of algorithms for image anonymisation [3]
- added the microaggregation algorithm used by k-Same
0.2.5
- added the
generalisation_strategyparameter to Anonymiser which determines how generalised data is represented (old behaviour is"human-readable") - added the generalisation strategies
"machine-readable"and"microaggregation" - added some attacks on anonymised data which can be found in the submodule
attackers - renamed
Anonymizerand itsanonymizemethod to BE spelling
0.2.6 - 0.2.8
- bugfixes
0.2.9
- added privacy metric percentage of vulnerable population [4]
- added counting query error as a utility metric [5]
- added the discernibility penalty [6] as a utility metric
0.2.10
- bugfixes
0.2.11
- added (de-)serialisation of generalisation schemas
- added
finalise()method to all attackers for consistency - removed dependency to
exact_multiset_cover, anonypyx is platform-independent again - renamed
prune_multiset_exact_cover()method inTrajectoryAttackertofinalise() - improved time efficiency of TrajectoryAttacker
References
- [1]: LeFevre, K., DeWitt, D. J., & Ramakrishnan, R. (2006). Mondrian multidimensional K-anonymity. 22nd International Conference on Data Engineering (ICDE’06), 25–25. https://doi.org/10.1109/ICDE.2006.101
- [2]: Domingo-Ferrer, J., & Torra, V. (2005). Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Mining and Knowledge Discovery, 11, 195–212.
- [3]: E. M. Newton, L. Sweeney, and B. Malin, ‘Preserving privacy by de-identifying face images’, IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 2, pp. 232–243, Feb. 2005, doi: 10.1109/TKDE.2005.32.
- [4]: Ganta, S. R., Kasiviswanathan, S. P., & Smith, A. (2008). Composition attacks and auxiliary information in data privacy. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 265–273.
- [5]: Xiao, X., & Tao, Y. (2007). M-invariance: Towards privacy preserving re-publication of dynamic datasets. Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, 689–700. https://doi.org/10.1145/1247480.1247556
- [6]: Bayardo, R. J., & Agrawal, R. (2005). Data privacy through optimal k-anonymization. 21st International Conference on Data Engineering (ICDE’05), 217–228. https://doi.org/10.1109/ICDE.2005.42
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file anonypyx-0.2.11.tar.gz.
File metadata
- Download URL: anonypyx-0.2.11.tar.gz
- Upload date:
- Size: 31.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9cd0b713cd371f6d33b3f5254bf7560b878bc4a42f261505e8c3bb500a31108b
|
|
| MD5 |
93f37ee59d1362ee92a23791fa8a4386
|
|
| BLAKE2b-256 |
4a73ccf310ef0e84b736544677a06d2fc0591ee43e68a091f2aab3d96448510e
|
File details
Details for the file anonypyx-0.2.11-py3-none-any.whl.
File metadata
- Download URL: anonypyx-0.2.11-py3-none-any.whl
- Upload date:
- Size: 37.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ef841be42d557b8945e6acf56d1a141ae0db82d75f00a8ddd78a0de9a841ffa
|
|
| MD5 |
8a4d69760b72c24c0291674d1f8dc4db
|
|
| BLAKE2b-256 |
e2e9b195f4f808ffec3328390c14150355f42b556857e69d8a7cce0aff4c8b02
|