A Data Anonymization package for tabular, image and sound data
Project description
anonympy 🕶️
Overview
A general Python library for data anonymization of tabular, text, image and sound data. See ArtLabs/projects for more or similar projects.
Main Features
Tabular
- Ease of use
- Efficient anonymization (based on pandas DataFrame)
- Numerous anonymization techniques
- Numeric
- Binning
- Perturbation
- PCA Masking
- Rounding
- Categorical
- Synthetic Data
- Resampling
- Tokenization
- Email Masking
- DateTime
- Synthetic Date
- Perturbation
Text, Image, Sound
- In development
Installation
Dependencies
- Python (>= 3.7)
- cape-privacy
- faker
- scikit-learn
- pandas
- numpy
- . . .
Install with pip
Easiest way to install anonympy is using pip
pip install cape-privacy==0.3.0 --no-deps
Due to conflicting pandas/numpy versions with cape-privacy, it's recommend to install them seperately
pip install anonympy
Install from source
Installing the library from source code is also possible
git clone https://github.com/ArtLabss/open-data-anonimizer.git
cd open-data-anonimizer
pip install -r requirements.txt
make bootstrap
pip install cape-privacy==0.3.0 --no-deps
Usage Example
You can find more examples here
from anonympy.pandas import dfAnonymizer
from anonympy.pandas.utils import load_dataset
df = load_dataset()
print(df)
name | age | birthdate | salary | web | ssn | ||
---|---|---|---|---|---|---|---|
0 | Bruce | 33 | 1915-04-17 | 59234.32 | http://www.alandrosenburgcpapc.co.uk | josefrazier@owen.com | 343554334 |
1 | Tony | 48 | 1970-05-29 | 49324.53 | http://www.capgeminiamerica.co.uk | eryan@lewis.com | 656564664 |
# Calling the generic Function
anonym = dfAnonymizer(df)
anonym.anonymize(inplace = False) # changes will be returned, not applied
name | age | birthdate | age | web | ssn | ||
---|---|---|---|---|---|---|---|
0 | Stephanie Patel | 30 | 1915-04-17 | 60000.0 | 5968b7880f | pjordan@example.com | 391-77-9210 |
1 | Daniel Matthews | 50 | 1971-01-21 | 50000.0 | 2ae31d40d4 | tparks@example.org | 872-80-9114 |
# Or applying a specific anonymization technique to a column
from anonympy.pandas.utils import available_methods
anonym.categorical_columns
... ['name', 'web', 'email', 'ssn']
available_methods('categorical')
... categorical_fake categorical_fake_auto categorical_resampling categorical_tokenization categorical_email_masking
anonym.anonymize({'name': 'categorical_fake',
'web': 'categorical_tokenization',
'email':'categorical_email_masking',
'ssn': 'categorical_fake'})
print(anonym.to_df())
name | age | birthdate | salary | web | ssn | ||
---|---|---|---|---|---|---|---|
0 | Paul Lang | 33 | 1915-04-17 | 59234.32 | 8ee92fb1bd | j*****r@owen.com | 792-82-0468 |
1 | Michael Gillespie | 48 | 1970-05-29 | 49324.53 | 51b615c92e | e*****n@lewis.com | 762-13-6119 |
Development
Contributions
The Contributing Guide has detailed information about contributing code and documentation.
Important Links
- Official source code repo: https://github.com/ArtLabss/open-data-anonimizer
- Download releases: https://pypi.org/project/anonympy/
- Issue tracker: https://github.com/ArtLabss/open-data-anonimizer/issues
License
Code of Conduct
Please see Code of Conduct. All community members are expected to follow it.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
anonympy-0.2.0.tar.gz
(19.6 kB
view hashes)