Skip to main content

CLI for Fidescls

Project description

Fidescls: PII Detection and Classification

A part of the greater Fides ecosystem.

License Code style: black Twitter

Fidescls banner

:zap: Overview

Fidescls (/fee-dhez classify/, from Fidēs, Latin for trust and reliability) is an open-source and extensible machine learning classification engine. Fidescls uses the Fides toolset (Fidesctl, Fidesops, and Fideslang) to assist in detecting and labeling potential sources of personal identifiable information, or PII, in your records and databases.

Fidescls overview

:rocket: Quick Start

Requirements

  • Docker 12+
  • Python 3.8+
  • Make

Getting Started

  1. Ensure that the required tools are installed and Docker is running, and clone this repository.

  2. From the project's root directory, run the following command:

make api

This will start an instance of the API server, and allow you to begin making requests.

  1. Make a post request to the classify endpoint:
localhost:8765/text/classify
Sample Payload - Content Classification
{
    "content": {
        "data": [
            "sample@aol.com",
            "(555) 555-5555",
            "4242-4242-4242-4242"
        ],
        "method_params": {
            "decision_method": "pass-through"
        }
    }
}
field description
data A string, or list of strings, representing the data to be processed.
decision_method A value of pass-through returns the higher-level PII classifications to which your data belongs.

Successful Response:

{
    "content": [
        {
            "input": "sample@aol.com",
            "labels": [
                {
                    "label": "EMAIL_ADDRESS",
                    "score": 1.0,
                    "position_start": 0,
                    "position_end": 14
                },
                {
                    "label": "DOMAIN_NAME",
                    "score": 1.0,
                    "position_start": 7,
                    "position_end": 14
                }
            ]
        },
        {
            "input": "(555) 555-5555",
            "labels": [
                {
                    "label": "PHONE_NUMBER",
                    "score": 0.4,
                    "position_start": 0,
                    "position_end": 14
                }
            ]
        },
        {
            "input": "4242-4242-4242-4242",
            "labels": [
                {
                    "label": "CREDIT_CARD",
                    "score": 1.0,
                    "position_start": 0,
                    "position_end": 19
                }
            ]
        }
    ]
}
Sample Payload - Context Classification
{
    "context": {
        "data": [
            "email_address",
            "phone_num",
            "credit_card"
            ],
        "method": "similarity",
        "method_params": {
            "possible_targets": [
                "user.derived.identifiable.device.ip_address",
                "user.provided.identifiable.financial.account_number",
                "user.provided.identifiable.contact.email",
                "user.provided.identifiable.contact.phone_number",
                "account.contact.street",
                "account.contact.city",
                "account.contact.state",
                "account.contact.country",
                "account.contact.postal_code"
            ],
            "top_n": 2
        }
    }
}
field description
data A string, or list of strings, representing the data to be processed.
possible_targets A list of potential Data Categories to classify your data into.
top_n The number of closest results to return.

Successful Response:

{
    "context": [
        {
            "input": "email_address",
            "labels": [
                {
                    "label": "user.provided.identifiable.contact.email",
                    "score": 0.791374585498101,
                    "position_start": null,
                    "position_end": null
                },
                {
                    "label": "account.contact.postal_code",
                    "score": 0.7402522077965934,
                    "position_start": null,
                    "position_end": null
                }
            ]
        },
        {
            "input": "phone_num",
            "labels": [
                {
                    "label": "user.provided.identifiable.contact.phone_number",
                    "score": 0.5770164988785474,
                    "position_start": null,
                    "position_end": null
                },
                {
                    "label": "account.contact.postal_code",
                    "score": 0.44817613132976103,
                    "position_start": null,
                    "position_end": null
                }
            ]
        },
        {
            "input": "credit_card",
            "labels": [
                {
                    "label": "user.provided.identifiable.financial.account_number",
                    "score": 0.5742921242220389,
                    "position_start": null,
                    "position_end": null
                },
                {
                    "label": "account.contact.postal_code",
                    "score": 0.5587338672966902,
                    "position_start": null,
                    "position_end": null
                }
            ]
        }
    ]
}

To learn more about the difference between Context and Content Classification, see the Classifiers Guide.

You've now successfully begun classifying PII!

:book: Learn More

The Fides core team is committed to providing a variety of documentation to help get you started using Fidescls. As such, all interactions are governed by the Fides Code of Conduct.

Documentation

For more information on getting started with Fidescls and the Fides ecosystem of open source projects, check out our documentation:

Support

Join the conversation on Slack and Twitter!

:balance_scale: License

The Fides ecosystem of tools (Fidescls, Fidesops and Fidesctl) are licensed under the Apache Software License Version 2.0. Fides tools are built on Fideslang, the Fides language specification, which is licensed under CC by 4.

Fides is created and sponsored by Ethyca: a developer tools company building the trust infrastructure of the internet. If you have questions or need assistance getting started, let us know at fides@ethyca.com!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fidescls-0.9.1.tar.gz (44.6 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page