a convenient way to anonymize your data for analytics

These details have not been verified by PyPI

Project links

Homepage

Project description

Anonymize df: a convenient way to anonymize your data for analytics

What is it?

Anonymize df is a package that helps you quickly and easily generate realistic fake data from a Pandas DataFrame.

What are the expected use cases / why was this made?

You're hiring consultants to work on your data but need to anonymize it first
You're a consultant and created something great that you want to make into a template

Installation

You can install anonymizedf using pip:

pip install anonymizedf

This will also try downloading the tableau hyper api and pandas packages if you don't have them already.

If you don't want to use pip you can also download this repository and execute:

python setup.py install

Example usage

import pandas as pd
from anonymizedf.anonymizedf import anonymize

# Import the data
df = pd.read_csv("https://query.data.world/s/shcktxndtu3ojonm46tb5udlz7sp3e")

# Prepare the data to be anonymized
an = anonymize(df)

# Select what data you want to anonymize and your preferred style

# Example 1 - just updates df
an.fake_names("Customer Name")
an.fake_ids("Customer ID")
an.fake_whole_numbers("Loyalty Reward Points")
an.fake_categories("Segment")
an.fake_dates("Date")
an.fake_decimal_numbers("Fraction")

# Example 2 - method chaining
fake_df = (
    an
    .fake_names("Customer Name", chaining=True)
    .fake_ids("Customer ID", chaining=True)
    .fake_whole_numbers("Loyalty Reward Points", chaining=True)
    .fake_categories("Segment", chaining=True)
    .fake_dates("Date", chaining=True)
    .fake_decimal_numbers("Fraction", chaining=True)
    .show_data_frame()
)

# Example 3 - multiple assignments
fake_df = an.fake_names("Customer Name")
fake_df = an.fake_ids("Customer ID")
fake_df = an.fake_whole_numbers("Loyalty Reward Points")
fake_df = an.fake_categories("Segment")
fake_df = an.fake_dates("Date")
fake_df = an.fake_decimal_numbers("Fraction")

fake_df.to_csv("fake_customers.csv", index=False)

# One thing to note is that you can't directly pass in a list of columns.
# If you want to apply the same function to multiple columns there are many ways to do that.

# Example 4 - for multiple columns

for column in column_list:
    an.fake_categories(column)

Example output

	Customer ID	Customer Name	Loyalty Reward Points	Segment	Date	Fraction	Fake_Customer Name	Fake_Customer ID	Fake_Loyalty Reward Points	Fake_Segment	Fake_Date	Fake_Fraction
0	AA-10315	Alex Avila	76	Consumer	01/01/2000	7.6	Christian Metcalfe-Reid	YEJP71011502726136	558	Segment 1	1978-11-09	29.96
1	AA-10375	Allen Armold	369	Consumer	02/01/2000	36.9	Helen Taylor	XWOB83170110594048	286	Segment 1	1989-12-29	72.50
2	AA-10480	Andrew Allen	162	Consumer	03/01/2000	16.2	Joanne Price	VVCJ28547588747677	742	Segment 1	1982-09-23	79.77
3	AA-10645	Anna Andreadi	803	Consumer	04/01/2000	80.3	Rhys Jones	OXCI12190813836802	206	Segment 1	2000-10-14	7.15
4	AB-10015	Aaron Bergman	935	Consumer	05/01/2000	93.5	Nigel Baldwin-Cook	JOXS05799252235987	914	Segment 1	2018-01-30	40.66

Dependencies

Pandas
Faker

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.1

Jun 11, 2020

1.0.0

Jun 11, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anonymizedf-1.0.1.tar.gz (21.9 kB view details)

Uploaded Jun 11, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

anonymizedf-1.0.1-py3-none-any.whl (7.0 kB view details)

Uploaded Jun 11, 2020 Python 3

File details

Details for the file anonymizedf-1.0.1.tar.gz.

File metadata

Download URL: anonymizedf-1.0.1.tar.gz
Upload date: Jun 11, 2020
Size: 21.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.3

File hashes

Hashes for anonymizedf-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`f953156ec0ad680cdec8e10439be78e5e10126dcf13ad234710c35ac19c51be1`
MD5	`c30c3d054e8a203da0ec8902953a3887`
BLAKE2b-256	`95adf726d4248836fb182bc361a6b653bfc6c07ec7bee58268db107ce51f058c`

See more details on using hashes here.

File details

Details for the file anonymizedf-1.0.1-py3-none-any.whl.

File metadata

Download URL: anonymizedf-1.0.1-py3-none-any.whl
Upload date: Jun 11, 2020
Size: 7.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.3

File hashes

Hashes for anonymizedf-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`971587990c599411351bf45e783fa30bcc81740b54613b218c23eb91155de7ea`
MD5	`34ace1603cc6772289997f03c41b87f9`
BLAKE2b-256	`d11f54b8001c141b23e4fdb28769c9bf131ad7e94a2455e178bcc99a68ec695e`

See more details on using hashes here.

anonymizedf 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Anonymize df: a convenient way to anonymize your data for analytics

What is it?

What are the expected use cases / why was this made?

Installation

Example usage

Example output

Dependencies

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes