Skip to main content

Package to mask pd.DataFrame data

Project description

codecov

zorro logo

Zorro DF

Zorro DF is a python package for masking pandas dataframe objects in order to anonymise data. It allows you to strip away identifiable column names and string values, replacing them with a generic naming convention. The package is built under the scikit-learn transformer framework and hence can be plugged into any scikit-learn Pipeline.

The package source-code can be found at http://github.com/epw505/zorro_df

Getting Started

Requirements

pandas>=0.25.3
scikit-learn>=0.22.1

Installation

Zorro DF can be installed using pip with the following command:

pip install zorro_df

Examples

Once the package is installed, you can load Zorro DF into your python session and use the Masker object to mask your data.

from zorro_df import mask_dataframe as mf

example_masker = mf.Masker()
example_masker.fit(data)
masked_data = example_masker.transform(data)

Tests

The test suite for Zorro DF is built using pytest with the pytest-mock plugin. Install both as follows.

pip install pytest
pip install pytest-mock

Once they are installed, you can run the test suite from the root directory of Zorro Df.

pytest tests/

Future Development

  • Reverse masking to allow retrieval of original data
  • Additional numerical scaling techniques

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zorro_df-1.1.1.tar.gz (7.3 kB view hashes)

Uploaded Source

Built Distribution

zorro_df-1.1.1-py3-none-any.whl (8.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page