arkhn_arx is a tool to pseudonymize or anonymize datasets while evaluating reidentification risk metrics
Project description
arkhn_arx
arkhn_arx is a module for dataset pseudonymization or anonymization which wraps pyarxaas
Install
pip install arkhn_arx
Connection to ARXaas service
This module uses https://github.com/navikt/arxaas service.
To run this service locally :
- Make sure Docker Desktop is running
- Pull the Docker image
docker pull navikt/arxaas
- Run the Docker image
docker run -p 8080:8080 navikt/arxaas
Anonymization
Principle
This module can be used in 3 modes : to evaluate reidentification risk of a dataset, pseudonymize dataset or anonymize dataset. Anonymization is performed using k-anonymity and l-diversity algorithms.
- k-anonymity ensures that the information for each person contained in the release cannot be distinguished from at least k-1 individuals whose information also appears in the release (defining a k-anonymity group).
- l-diversity ensures that sensitive attributes are well represented (at least l distinct values) in each k-anonymity group
Arguments
input_dataframe
to anonymizeconfiguration_file
: json file containing anonymization parameters
config_dict = {"anonymization":{"type": 2, "k":2, "l":2},
"attributes":[
{"customName":"att_1",
"att_type":"att_type"
"hierarchy_type":"hierarchy_type"},
]
}
-
Anonymization parameters:
- type : 0 returns risk metrics for initial dataset, 1 pseudonymize dataset, 2 anonymize dataset
- k : parameter for K-anonymity
- l : parameter for l-diversity
-
Attributes parameters: for each attribute gives :
- customName : column name of attribute in dataframe
- att_type : attribute type for anonymization, can be:
"insensitive"
: will be kept unmodified"sensitive"
: will be kept as-is but they can be protected using privacy models, such as t-closeness or l-diversity"quasiidentifying"
: will be transformed using hierarchies"identifying""
: will be removed from the dataset
- hierarchy_type : type of hierarchy to apply to attribute for anonymization, can be:
"interval"
: can be used for variables with a ratio scale, intervals are defined using attribute quantiles"date"
: can be used for dates"redaction"
: can be used for a broad spectrum of attributes, masking parts of variables"order"
: NOT IMPLEMENTED can be used for variables with an ordinal scale, defining ordered group of variables
-
URL_link
to ARXaaS service : if ARXaas service is running locally URL is : "http://localhost:8080"
Example
You can test this module using the example.py script
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file arkhn_arx-0.0.8.tar.gz
.
File metadata
- Download URL: arkhn_arx-0.0.8.tar.gz
- Upload date:
- Size: 4.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 93eae1b9111138318631dcdffe065229a0b86b442dfcddb52fdfc43d00d9fe74 |
|
MD5 | b63635a8328e7d9c9debf7cdada143fb |
|
BLAKE2b-256 | 8d57ccba4368deb453108dd57fb89327995267e24fc0e6b8e6e19e5e259423f1 |
File details
Details for the file arkhn_arx-0.0.8-py3-none-any.whl
.
File metadata
- Download URL: arkhn_arx-0.0.8-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00dd3ccb84dac74edbc4d8dc4e9ad304701abc656327d290bb2e1381ac18cd0b |
|
MD5 | 94b4b032cee33747ea99dd2e7e7188bf |
|
BLAKE2b-256 | b105882d538ddec01890a90183b0dcbb69ee49147f905eebd0d8e64162e96fb7 |