Skip to main content

a convenient way to anonymize your data for analytics

Project description

Anonymize df: a convenient way to anonymize your data for analytics

PyPI PyPI - Status PyPI - License Code style: black

What is it?

Anonymize df is a package that helps you quickly and easily generate realistic fake data from a Pandas DataFrame.

What are the expected use cases / why was this made?

  • You're hiring consultants to work on your data but need to anonymize it first
  • You're a consultant and created something great that you want to make into a template

Installation

You can install anonymizedf using pip:

pip install anonymizedf

This will also try downloading the tableau hyper api and pandas packages if you don't have them already.

If you don't want to use pip you can also download this repository and execute:

python setup.py install

Example usage

import pandas as pd
from anonymizedf.anonymizedf import anonymize

# Import the data
df = pd.read_csv("https://query.data.world/s/shcktxndtu3ojonm46tb5udlz7sp3e")

# Prepare the data to be anonymized
an = anonymize(df)

# Select what data you want to anonymize and your preferred style

# Example 1 - just updates df
an.fake_names("Customer Name")
an.fake_ids("Customer ID")
an.fake_whole_numbers("Loyalty Reward Points")
an.fake_categories("Segment")
an.fake_dates("Date")
an.fake_decimal_numbers("Fraction")

# Example 2 - method chaining
fake_df = (
    an
    .fake_names("Customer Name", chaining=True)
    .fake_ids("Customer ID", chaining=True)
    .fake_whole_numbers("Loyalty Reward Points", chaining=True)
    .fake_categories("Segment", chaining=True)
    .fake_dates("Date", chaining=True)
    .fake_decimal_numbers("Fraction", chaining=True)
    .show_data_frame()
)

# Example 3 - multiple assignments
fake_df = an.fake_names("Customer Name")
fake_df = an.fake_ids("Customer ID")
fake_df = an.fake_whole_numbers("Loyalty Reward Points")
fake_df = an.fake_categories("Segment")
fake_df = an.fake_dates("Date")
fake_df = an.fake_decimal_numbers("Fraction")

fake_df.to_csv("fake_customers.csv", index=False)

# One thing to note is that you can't directly pass in a list of columns.
# If you want to apply the same function to multiple columns there are many ways to do that.

# Example 4 - for multiple columns

for column in column_list:
    an.fake_categories(column)

Example output

Customer ID Customer Name Loyalty Reward Points Segment Date Fraction Fake_Customer Name Fake_Customer ID Fake_Loyalty Reward Points Fake_Segment Fake_Date Fake_Fraction
0 AA-10315 Alex Avila 76 Consumer 01/01/2000 7.6 Christian Metcalfe-Reid YEJP71011502726136 558 Segment 1 1978-11-09 29.96
1 AA-10375 Allen Armold 369 Consumer 02/01/2000 36.9 Helen Taylor XWOB83170110594048 286 Segment 1 1989-12-29 72.50
2 AA-10480 Andrew Allen 162 Consumer 03/01/2000 16.2 Joanne Price VVCJ28547588747677 742 Segment 1 1982-09-23 79.77
3 AA-10645 Anna Andreadi 803 Consumer 04/01/2000 80.3 Rhys Jones OXCI12190813836802 206 Segment 1 2000-10-14 7.15
4 AB-10015 Aaron Bergman 935 Consumer 05/01/2000 93.5 Nigel Baldwin-Cook JOXS05799252235987 914 Segment 1 2018-01-30 40.66

Dependencies

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anonymizedf-1.0.1.tar.gz (21.9 kB view details)

Uploaded Source

Built Distribution

anonymizedf-1.0.1-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file anonymizedf-1.0.1.tar.gz.

File metadata

  • Download URL: anonymizedf-1.0.1.tar.gz
  • Upload date:
  • Size: 21.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.3

File hashes

Hashes for anonymizedf-1.0.1.tar.gz
Algorithm Hash digest
SHA256 f953156ec0ad680cdec8e10439be78e5e10126dcf13ad234710c35ac19c51be1
MD5 c30c3d054e8a203da0ec8902953a3887
BLAKE2b-256 95adf726d4248836fb182bc361a6b653bfc6c07ec7bee58268db107ce51f058c

See more details on using hashes here.

File details

Details for the file anonymizedf-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: anonymizedf-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 7.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.3

File hashes

Hashes for anonymizedf-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 971587990c599411351bf45e783fa30bcc81740b54613b218c23eb91155de7ea
MD5 34ace1603cc6772289997f03c41b87f9
BLAKE2b-256 d11f54b8001c141b23e4fdb28769c9bf131ad7e94a2455e178bcc99a68ec695e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page