Skip to main content

A lightweight library for anonymizing and transforming data in pandas DataFrames, including masking, suppression, perturbation, permutation, generalization, and pseudonymization.

Project description

df-anonymizer

A lightweight Python library designed to apply privacy-preserving transformations on datasets in pandas.DataFrame format.
It is ideal for preparing data for research, analysis, reporting or machine learning while protecting sensitive personal information.

✨ Key Features

  • Masking: Mask email addresses and identification numbers
  • Pseudonymization: Generate unique pseudonyms with key mapping table
  • Data perturbation: Add privacy noise to age, weight, height, etc.
  • Data generalization: Bucket or reduce granularity for numeric and date values
  • Suppression: Remove sensitive columns or filter out specific records
  • Shuffling: Randomly reorder rows
  • Evaluation: Compute the k-anonymity score for your dataset

All functions are optimized to work with pandas.DataFrame structures.

📦 Installation

pip install df-anonymizer

👉 Example

📌 Pseudonymization

import pandas as pd
from df_anonymizer import pseudonymization

df = pd.DataFrame({'NRIC': ['S1234567A', 'S2345678B', 'S3456789C']})
anon_df = pseudonymization(df, 'NRIC')
print(anon_df)

# Example output:
#        NRIC
# 0  abcd123456
# 1  efgh234567
# 2  ijkl345678

📌 Masking

from df_anonymizer import maskID, maskEmail

df_mask = pd.DataFrame({
    'ID': ['123456789', '987654321'],
    'Email': ['alice@example.com', 'bob@example.com']
})
df_mask = maskID(df_mask, 'ID')
df_mask = maskEmail(df_mask, 'Email')
print(df_mask)

# Output:
#          ID              Email
# 0     ******789     a****@example.com
# 1     ******321     b**@example.com

Reference

  1. Guide To Basic Anonymization

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

df_anonymizer-0.1.3.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

df_anonymizer-0.1.3-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file df_anonymizer-0.1.3.tar.gz.

File metadata

  • Download URL: df_anonymizer-0.1.3.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for df_anonymizer-0.1.3.tar.gz
Algorithm Hash digest
SHA256 8226d3606e0f035c62f5466a744a1fbc38366ebf2b1b0d876e9af64d02d75154
MD5 9e37f1ec9bb95360e4e091612ce5fb7a
BLAKE2b-256 20c92bff21aa0d5ea0b8e98138f011f2ecb428d145ffd7f57e25dc0655a9146c

See more details on using hashes here.

File details

Details for the file df_anonymizer-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: df_anonymizer-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 10.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for df_anonymizer-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 bfe8e00ecda0eb1a782044d7df1370b474c300159b6d138e3f66a362f992eeed
MD5 ac3ff258211bac293d6295d255bb95ae
BLAKE2b-256 7bccbee582169bbb331a279987a9aec0480b1a895c45c6251aa9047cf79fc0ec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page