Skip to main content

arkhn_arx is a tool to pseudonymize or anonymize datasets while evaluating reidentification risk metrics

Project description

arkhn_arx

arkhn_arx is a module for dataset pseudonymization or anonymization which wraps pyarxaas

Install

pip install arkhn_arx

Connection to ARXaas service

This module uses https://github.com/navikt/arxaas service.

To run this service locally :

  1. Make sure Docker Desktop is running
  2. Pull the Docker image
docker pull navikt/arxaas
  1. Run the Docker image
docker run -p 8080:8080 navikt/arxaas

Anonymization

Principle

This module can be used in 3 modes : to evaluate reidentification risk of a dataset, pseudonymize dataset or anonymize dataset. Anonymization is performed using k-anonymity and l-diversity algorithms.

  • k-anonymity ensures that the information for each person contained in the release cannot be distinguished from at least k-1 individuals whose information also appears in the release (defining a k-anonymity group).
  • l-diversity ensures that sensitive attributes are well represented (at least l distinct values) in each k-anonymity group

Arguments

  • input_dataframe to anonymize
  • configuration_file : json file containing anonymization parameters
config_dict = {"anonymization":{"type": 2, "k":2, "l":2},
                "attributes":[
                            {"customName":"att_1",
                             "att_type":"att_type"
                             "hierarchy_type":"hierarchy_type"}, 
                            ]
                }
  • Anonymization parameters:

    • type : 0 returns risk metrics for initial dataset, 1 pseudonymize dataset, 2 anonymize dataset
    • k : parameter for K-anonymity
    • l : parameter for l-diversity
  • Attributes parameters: for each attribute gives :

    • customName : column name of attribute in dataframe
    • att_type : attribute type for anonymization, can be:
      • "insensitive" : will be kept unmodified
      • "sensitive" : will be kept as-is but they can be protected using privacy models, such as t-closeness or l-diversity
      • "quasiidentifying" : will be transformed using hierarchies
      • "identifying"" : will be removed from the dataset
    • hierarchy_type : type of hierarchy to apply to attribute for anonymization, can be:
      • "interval" : can be used for variables with a ratio scale, intervals are defined using attribute quantiles
      • "date" : can be used for dates
      • "redaction" : can be used for a broad spectrum of attributes, masking parts of variables
      • "order" : NOT IMPLEMENTED can be used for variables with an ordinal scale, defining ordered group of variables
  • URL_link to ARXaaS service : if ARXaas service is running locally URL is : "http://localhost:8080"

Example

You can test this module using the example.py script

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arkhn_arx-0.0.8.tar.gz (4.7 kB view details)

Uploaded Source

Built Distribution

arkhn_arx-0.0.8-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file arkhn_arx-0.0.8.tar.gz.

File metadata

  • Download URL: arkhn_arx-0.0.8.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.3

File hashes

Hashes for arkhn_arx-0.0.8.tar.gz
Algorithm Hash digest
SHA256 93eae1b9111138318631dcdffe065229a0b86b442dfcddb52fdfc43d00d9fe74
MD5 b63635a8328e7d9c9debf7cdada143fb
BLAKE2b-256 8d57ccba4368deb453108dd57fb89327995267e24fc0e6b8e6e19e5e259423f1

See more details on using hashes here.

File details

Details for the file arkhn_arx-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: arkhn_arx-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.3

File hashes

Hashes for arkhn_arx-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 00dd3ccb84dac74edbc4d8dc4e9ad304701abc656327d290bb2e1381ac18cd0b
MD5 94b4b032cee33747ea99dd2e7e7188bf
BLAKE2b-256 b105882d538ddec01890a90183b0dcbb69ee49147f905eebd0d8e64162e96fb7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page