Skip to main content

Microsoft Azure Health Deidentification Client Library for Python

Project description

Azure Health Deidentification client library for Python

Azure.Health.Deidentification is a managed service that enables users to tag, redact, or surrogate health data.

Getting started

Install the package

python -m pip install azure-health-deidentification

Prequisites

  • Python 3.8 or later is required to use this package.
  • You need an Azure subscription to use this package.
  • An existing Azure Health Deidentification instance.

Create with an Azure Active Directory Credential

To use an Azure Active Directory (AAD) token credential, provide an instance of the desired credential type obtained from the azure-identity library.

To authenticate with AAD, you must first pip install azure-identity

After setup, you can choose which type of credential from azure.identity to use. As an example, DefaultAzureCredential can be used to authenticate the client:

Set the values of the client ID, tenant ID, and client secret of the AAD application as environment variables: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET

Use the returned token credential to authenticate the client:

>>> from azure.health.deidentification import DeidentificationClient
>>> from azure.identity import DefaultAzureCredential
>>> client = DeidentificationClient(endpoint='<endpoint>', credential=DefaultAzureCredential())

Key concepts

Operation Modes

  • Tag: Will return a structure of offset and length with the PHI category of the related text spans.
  • Redact: Will return output text with placeholder stubbed text. ex. [name]
  • Surrogate: Will return output text with synthetic replacements.
    • My name is John Smith
    • My name is Tom Jones

Job Integration with Azure Storage Instead of sending text, you can send an Azure Storage Location to the service. We will asynchronously process the list of files and output the deidentified files to a location of your choice.

Limitations:

  • Maximum file count per job: 1000 documents
  • Maximum file size per file: 2 MB

Examples

>>> from azure.health.deidentification import DeidentificationClient
>>> from azure.identity import DefaultAzureCredential
>>> from azure.core.exceptions import HttpResponseError

>>> client = DeidentificationClient(endpoint='<endpoint>', credential=DefaultAzureCredential())
>>> try:
        <!-- write test code here -->
    except HttpResponseError as e:
        print('service responds error: {}'.format(e.response.json()))

Next steps

  • Find a bug, or have feedback? Raise an issue with "Health Deidentification" Label.

Troubleshooting

  • Unabled to Access Source or Target Storage
    • Ensure you create your deid service with a system assigned managed identity
    • Ensure your storage account has given permissions to that managed identity

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information, see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

azure-health-deidentification-1.0.0b1.tar.gz (58.6 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file azure-health-deidentification-1.0.0b1.tar.gz.

File metadata

File hashes

Hashes for azure-health-deidentification-1.0.0b1.tar.gz
Algorithm Hash digest
SHA256 64bcc107e2f69a2924774a4514938c6f04a62d834528b691d44be90ec042b524
MD5 7f9293707c5a8f44e7a3c51c1b9f1e57
BLAKE2b-256 db7b9c974ef4854891ff99f6d199b4e2744493b8d969deb91421baec6cd77059

See more details on using hashes here.

File details

Details for the file azure_health_deidentification-1.0.0b1-py3-none-any.whl.

File metadata

File hashes

Hashes for azure_health_deidentification-1.0.0b1-py3-none-any.whl
Algorithm Hash digest
SHA256 74f2d03892b447b875c022b1ee0b7c8430e6c404a72575db2ad06ccb452ec23c
MD5 763082740438e005d4a9faf9310b9122
BLAKE2b-256 cb07fecd559722530dda99a3ff2d92fd688a2ea08802f063a74635af97b571a2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page