Skip to main content

A Data Anonymization package for tabular, image and sound data

Project description

anonympy 🕶️



With :heartpulse: by ArtLabs

Overview

A general Python library for data anonymization of tabular, text, image and sound data. See ArtLabs/projects for more or similar projects.


Main Features

Tabular

  • Ease of use
  • Efficient anonymization (based on pandas DataFrame)
  • Numerous anonymization techniques
    • Numeric
      • Binning
      • Perturbation
      • PCA Masking
      • Rounding
    • Categorical
      • Synthetic Data
      • Resampling
      • Tokenization
      • Email Masking
    • DateTime
      • Synthetic Date
      • Perturbation

Text, Image, Sound

  • In development

Installation

Dependencies

  1. Python (>= 3.7)
  2. cape-privacy
  3. faker
  4. scikit-learn
  5. pandas
  6. numpy
  7. . . .

Install with pip

Easiest way to install anonympy is using pip

pip install cape-privacy==0.3.0 --no-deps 

Due to conflicting pandas/numpy versions with cape-privacy, it's recommend to install them seperately

pip install anonympy

Install from source

Installing the library from source code is also possible

git clone https://github.com/ArtLabss/open-data-anonimizer.git
cd open-data-anonimizer
pip install -r requirements.txt
make bootstrap
pip install cape-privacy==0.3.0 --no-deps 

Usage Example

You can find more examples here

from anonympy.pandas import dfAnonymizer
from anonympy.pandas.utils import load_dataset

df = load_dataset() 
print(df)
name age birthdate salary web email ssn
0 Bruce 33 1915-04-17 59234.32 http://www.alandrosenburgcpapc.co.uk josefrazier@owen.com 343554334
1 Tony 48 1970-05-29 49324.53 http://www.capgeminiamerica.co.uk eryan@lewis.com 656564664
# Calling the generic Function
anonym = dfAnonymizer(df)
anonym.anonymize(inplace = False) # changes will be returned, not applied
name age birthdate age web email ssn
0 Stephanie Patel 30 1915-04-17 60000.0 5968b7880f pjordan@example.com 391-77-9210
1 Daniel Matthews 50 1971-01-21 50000.0 2ae31d40d4 tparks@example.org 872-80-9114
# Or applying a specific anonymization technique to a column
from anonympy.pandas.utils import available_methods

anonym.categorical_columns
... ['name', 'web', 'email', 'ssn']
available_methods('categorical') 
... categorical_fake	categorical_fake_auto	categorical_resampling	categorical_tokenization	categorical_email_masking
  
anonym.anonymize({'name': 'categorical_fake', 
                  'web': 'categorical_tokenization', 
                  'email':'categorical_email_masking', 
                  'ssn': 'categorical_fake'})
print(anonym.to_df())
name age birthdate salary web email ssn
0 Paul Lang 33 1915-04-17 59234.32 8ee92fb1bd j*****r@owen.com 792-82-0468
1 Michael Gillespie 48 1970-05-29 49324.53 51b615c92e e*****n@lewis.com 762-13-6119

Development

Contributions

The Contributing Guide has detailed information about contributing code and documentation.

Important Links

License

BSD 3

Code of Conduct

Please see Code of Conduct. All community members are expected to follow it.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anonympy-0.1.3.tar.gz (19.4 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page