Anonymize CSV datasets
Project description
vendetta
Anonymize CSV file(s) by replacing sensitive values with fakes.
Installation
pip install vendetta
Example
Suppose you have orders.csv
dataset with real customer names and order IDs.
CustomerName,CustomerLastName,OrderID
Darth,Wader,1254
Darth,Wader,1255
,Yoda,1256
Luke,Skywalker,1257
Leia,Skywalker,1258
,Yoda,1259
This list contains 4 unique customers. Let's create a configuration file, say, orders.yaml
:
columns:
CustomerName: first_name
CustomerLastName: last_name
and run:
vendetta anonymize orders.yaml < orders.csv > anon.csv
which gives something like this in anon.csv
:
CustomerName,CustomerLastName,OrderID
Elizabeth,Oliver,1254
Elizabeth,Oliver,1255
Karen,Rodriguez,1256
Jonathan,Joseph,1257
Katelyn,Joseph,1258
Karen,Rodriguez,1259
- OrderID column was not mentioned in the config, and was left as is
- Using faker, program replaced the first and last names with random first and last names, making the data believable
- If in the source file two cells for the same column had the same value (Vader), the output file will also have identical values in these cells.
Enjoy!
License
Credits
This project was generated with wemake-python-package
. Current template version is: b80221aaae4ac702bea7e66b77b9389d527c1e3c. See what is updated since then.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
vendetta-0.0.2.tar.gz
(5.0 kB
view hashes)