Skip to main content

A Data Anonymization package for tabular, image and sound data

Project description

anonympy 🕶️



With ❤️ by ArtLabs

Overview

A general Python library for data anonymization of tabular, text, image and sound data. See ArtLabs/projects for more or similar projects.


Main Features

Tabular

  • Ease of use
  • Efficient anonymization (based on pandas DataFrame)
  • Numerous anonymization techniques
    • Numeric
      • Generalization - Binning
      • Perturbation
      • PCA Masking
      • Generalization - Rounding
    • Categorical
      • Synthetic Data
      • Resampling
      • Tokenization
      • Partial Email Masking
    • DateTime
      • Synthetic Date
      • Perturbation

Images

  • Anonymization Techniques
    • Personal Images (faces)
      • Blurring
      • Pixaled Face Blurring
      • Salt and Pepper Noise
    • General Images
      • Blurring

PDF, Text, Sound

  • In Development

Installation

Dependencies

  1. Python (>= 3.7)
  2. cape-privacy
  3. faker
  4. pandas
  5. OpenCV
  6. . . .

Install with pip

Easiest way to install anonympy is using pip

pip install anonympy

Due to conflicting pandas/numpy versions with cape-privacy, it's recommend to install them seperately

pip install cape-privacy==0.3.0 --no-deps 

Install from source

Installing the library from source code is also possible

git clone https://github.com/ArtLabss/open-data-anonimizer.git
cd open-data-anonimizer
pip install -r requirements.txt
make bootstrap
pip install cape-privacy==0.3.0 --no-deps 

Downloading Repository

Or you could download this repository from pypi and run the following:

cd open-data-anonimizer
python setup.py install

Usage Example

Google Colab

You can find more examples here

Tabular

from anonympy.pandas import dfAnonymizer
from anonympy.pandas.utils import load_dataset

df = load_dataset() 
print(df)
name age birthdate salary web email ssn
0 Bruce 33 1915-04-17 59234.32 http://www.alandrosenburgcpapc.co.uk josefrazier@owen.com 343554334
1 Tony 48 1970-05-29 49324.53 http://www.capgeminiamerica.co.uk eryan@lewis.com 656564664
# Calling the generic Function
anonym = dfAnonymizer(df)
anonym.anonymize(inplace = False) # changes will be returned, not applied
name age birthdate age web email ssn
0 Stephanie Patel 30 1915-05-10 60000.0 5968b7880f pjordan@example.com 391-77-9210
1 Daniel Matthews 50 1971-01-21 50000.0 2ae31d40d4 tparks@example.org 872-80-9114
# Or applying a specific anonymization technique to a column
from anonympy.pandas.utils import available_methods

anonym.categorical_columns
... ['name', 'web', 'email', 'ssn']
available_methods('categorical') 
... categorical_fake	categorical_fake_auto	categorical_resampling	categorical_tokenization	categorical_email_masking
  
anonym.anonymize({'name': 'categorical_fake', 
                  'age': 'numeric_noise',
                  'birthdate': 'datetime_noise',
                  'salary': 'numeric_rounding',
                  'web': 'categorical_tokenization', 
                  'email':'categorical_email_masking', 
                  'ssn': 'column_suppression'})
print(anonym.to_df())
name age birthdate salary web email
0 Paul Lang 31 1915-04-17 60000.0 8ee92fb1bd j*****r@owen.com
1 Michael Gillespie 42 1970-05-29 50000.0 51b615c92e e*****n@lewis.com

Images

# Passing an Image
import cv2
from anonympy.images import imAnonymizer

img = cv2.imread('sulking_boy.jpg')
anonym = imAnonymizer(img)

blurred = anonym.face_blur((31, 31), shape='r', box = 'r')  # blurring shape and bounding box ('r' / 'c')
cv2.imshow('Blurred', blurred)
anonym.face_blur() anonym.face_pixel() anonym.face_SaP()
input_img1 output_img1 sap_image
# Passing a Folder 
path = 'C:/Users/shakhansho.sabzaliev/Downloads/Data' # images are inside `Data` folder
dst = 'D:/' # destination folder
anonym = imAnonymizer(path, dst)

anonym.blur(method = 'median', kernel = 11) 

This will create a folder Output in dst directory.

The Data folder had the following structure

|   1.jpg
|   2.jpg
|   3.jpeg
|   
\---test
    |   4.png
    |   5.jpeg
    |   
    \---test2
            6.png

The Output folder will have the same structure and file names but blurred images.


Development

Contributions

The Contributing Guide has detailed information about contributing code and documentation.

Important Links

License

BSD-3

Code of Conduct

Please see Code of Conduct. All community members are expected to follow it.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anonympy-0.3.0.tar.gz (5.7 MB view details)

Uploaded Source

File details

Details for the file anonympy-0.3.0.tar.gz.

File metadata

  • Download URL: anonympy-0.3.0.tar.gz
  • Upload date:
  • Size: 5.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.2

File hashes

Hashes for anonympy-0.3.0.tar.gz
Algorithm Hash digest
SHA256 39248efa16ebf491e71fd2347d20d3a0d9e54e95ba5da804ef948ebfa34acd1c
MD5 55bbe59da7105e7809f99f2dfba628d2
BLAKE2b-256 95d7d27a56a127eb0a1469fbc5190a40d190d95434438cbcd95b041539b58e16

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page