Skip to main content

Cape manages secure access to all of your data.

Project description

Cape Privacy

License codecov PyPI version Chat on Slack

Cape Privacy offers data scientists and data engineers a policy-based interface for applying privacy-enhancing techniques across several popular libraries and frameworks to protect sensitive data throughout the data science life cycle.

Cape Python brings Cape's policy language to Pandas and Apache Spark, enabling you to collaborate on privacy-preserving policy at a non-technical level. The supported techniques include tokenization with linkability as well as perturbation and rounding. You can experiment with these techniques programmatically, in Python or in human-readable policy files. Stay tuned for more privacy-enhancing techniques in the future!

See below for instructions on how to get started or visit the documentation.

Getting Started

Cape Python is available via Pypi.

pip install cape-privacy

Support for Apache Spark is optional. If you plan on using the library together with Apache Spark, we suggest the following instead:

pip install cape-privacy[spark]

We recommend running it in a virtual environment, such as venv.

Installing from source

It is also possible to install the library from source.

git clone
cd cape-python
make bootstrap

This will also install all dependencies, including Apache Spark. Make sure you have make installed before running the above.


(this example is an abridged version of the tutorial found here)

To discover what different transformations do and how you might use them, it is best to explore via the transformations APIs:

df = pd.DataFrame({
    "name": ["alice", "bob"],
    "age": [34, 55],
    "birthdate": [pd.Timestamp(1985, 2, 23), pd.Timestamp(1963, 5, 10)],

tokenize = Tokenizer(max_token_len=10, key=b"my secret")
perturb_numeric = NumericPerturbation(dtype=dtypes.Integer, min=-10, max=10)

df["name"] = tokenize(df["name"])
df["age"] = perturb_numeric(df["age"])

# >>
#          name  age  birthdate
# 0  f42c2f1964   34 1985-02-23
# 1  2e586494b2   63 1963-05-10

These steps can be saved in policy files so you can share them and collaborate with your team:

# my-policy.yaml
label: my-policy
version: 1
  - match:
      name: age
      - transform:
          type: numeric-perturbation
          dtype: Integer
          min: -10
          max: 10
          seed: 4984
  - match:
      name: name
      - transform:
          type: tokenizer
          max_token_len: 10
          key: my secret

You can then load this policy and apply it to your data frame:

# df can be a Pandas or Spark data frame 
policy = cape.parse_policy("my-policy.yaml")
df = cape.apply_policy(policy, df)

# >>
#          name  age  birthdate
# 0  f42c2f1964   34 1985-02-23
# 1  2e586494b2   63 1963-05-10

You can see more examples and usage here or by visiting our documentation.

Contributing and Bug Reports

Please file any feature request or bug report as GitHub issues.


Licensed under Apache License, Version 2.0 (see LICENSE or Copyright as specified in NOTICE.

About Cape

Cape Privacy helps teams share data and make decisions for safer and more powerful data science. Learn more at

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cape-privacy-0.1.1.tar.gz (24.6 kB view hashes)

Uploaded source

Built Distribution

cape_privacy-0.1.1-py3-none-any.whl (39.4 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page