A thin client for communicating with the Private Ai de-identication api.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

privateai_client

A client for communicating with the Private Ai de-identication api. This document provides information about how to best use the client. For more information, see Private Ai's API Documentation.

Installation

pip install privateai_client

Quick Start

from privateai_client import PAIClient
from privateai_client import request_objects

schema = "http"
host = "localhost"
port = "8080"

client = PAIClient(schema=schema, host=host, port=port)

text_request = request_objects.process_text(text=["My sample name is John Smith"])
text_request.text

response = client.process_text(text_request)
response.processed_text

Output:

["My sample name is John Smith"]
['My sample name is [NAME_1]']

Working with the Client

Initializing the Client

the PAI client requires a scheme, host and optional port to initialize. Once created, the connection can be tested with the client's ping function

from privateai_client import PAIClient
scheme = 'http'
host = 'localhost'
port= '8080'
client = PAIClient(scheme, host, port)
 
client.ping()

Output:

True

Making Requests

Once initialized the client can be used to make any request listed in the Private-Ai documentation

Available requests:

Client Function	Endpoint
`get_version`	`/`
`get_metrics`	`/metrics`
`process_text`	`/v3/process/text`
`process_files_url`	`/v3/process/files/uri`
`process_files_base64`	`/v3/process/files/base64`
`bleep`	`/v3/bleep`

Requests can be made using dictionaries:

sample_text = "This is John Smith's sample dictionary request"
text_dict_request = {"text": sample_text}

response = client.process_text(text_dict_request)
response.processed_text

Output:

["This is [NAME_1]'s sample dictionary request"]

or using built-in request objects:

from privateai_client import request_objects

sample_text = "This is John Smith's sample process text object request"
text_request_object =  request_objects.process_text_obj(text=[sample_text])

response = client.process_text(text_request_object)
response.processed_text

Output:

["This is [NAME_1]'s sample process text object request"]

Request Objects

Request objects are a simple way of creating request bodies without the tediousness of writing dictionaries. Every post request (as listed in the Private-Ai documentation) has its own request own request object.

from privateai_client import request_objects

sample_obj = request_objects.file_url_obj(uri='path/to/file.jpg')
sample_obj.uri

Output:

'path/to/file.jpg'

Additionally there are request objects for each nested dictionary of a request:

from privateai_client import request_objects

sample_text = "This is John Smith's sample process text object request where names won't be removed"

# sub-dictionary of entity_detection
sample_entity_type_selector = request_objects.entity_type_selector_obj(type="DISABLE", value=['NAME', 'NAME_GIVEN', 'NAME_FAMILY'])

# sub-dictionary of a process text request
sample_entity_detection = request_objects.entity_detection_obj(entity_types=[sample_entity_type_selector])

# request object created using the sub-dictionaries
sample_request = request_objects.process_text_obj(text=[sample_text], entity_detection=sample_entity_detection)
response = client.process_text(sample_request)
print(response.processed_text)

Output:

["This is John Smith's sample process text object request where names won't be removed"]

Building Request Objects

Request objects can initialized by passing in all the required values needed for the request as arguments or from a dictionary:

# Passing arguments 
sample_data = "JVBERi0xLjQKJdPr6eEKMSAwIG9iago8PC9UaXRsZSAoc2FtcGxlKQovUHJvZHVj..."
sample_content_type = "application/pdf"

sample_file_obj = request_objects.file_obj(data=sample_data, content_type=sample_content_type)

# Passing a dictionary
sample_dict = {"data": "JVBERi0xLjQKJdPr6eEKMSAwIG9iago8PC9UaXRsZSAoc2FtcGxlKQovUHJvZHVj...",
               "content_type": "application/pdf"}

sample_file_obj2 = request_objects.file_obj.fromdict(sample_dict)

Request objects also can be formatted as dictionaries:

from privateai_client import request_objects

sample_text = "Sample text."
# Create the nested request objects
sample_entity_type_selector = request_objects.entity_type_selector_obj(type="DISABLE", value=['HIPAA'])
sample_entity_detection = request_objects.entity_detection_obj(entity_types=[sample_entity_type_selector])
# Create the request object
sample_request = request_objects.process_text_obj(text=[sample_text], entity_detection=sample_entity_detection)

# All nested objects are also formatted
print(sample_request.to_dict())

Output:

{
 'text': ['Sample text.'], 
 'link_batch': False, 
 'entity_detection': {'accuracy': 'high', 'entity_types': [{'type': 'DISABLE', 'value': ['HIPAA']}], 'filter': [], 'return_entity': True}, 
 'processed_text': {'type': 'MARKER', 'pattern': '[UNIQUE_NUMBERED_ENTITY_TYPE]'}
}

Sample Use

Processing a directory of files

from privateai_client import PAIClient
from privateai_client.objects import request_objects
import os
import logging

file_dir = "/path/to/file/directory"
client = PAIClient("http", "localhost", "8080")
for file_name in os.listdir(file_dir):
    filepath = os.path.join(file_dir, file_name)
    if not os.path.isfile(filepath):
        continue
    req_obj = request_objects.file_url_obj(uri=filepath)
    # NOTE this method of file processing requires the container to have an the input and output directories mounted
    resp = client.process_files_uri(req_obj)
    if not resp.ok:
        logging.error(f"response for file {file_name} returned with {resp.status_code}")

Processing a Base64 file

from privateai_client import PAIClient
from privateai_client.objects import request_objects
import base64
import os
import logging

file_dir = "/path/to/your/file"
file_name = 'sample_file.pdf'
filepath = os.path.join(file_dir,file_name)
file_type= "type/of_file" #eg. application/pdf
client = PAIClient("http", "localhost", "8080")

# Read from file
with open(filepath, "rb") as b64_file:
    file_data = base64.b64encode(b64_file.read())
    file_data = file_data.decode("ascii")

# Make the request
file_obj = request_objects.file_obj(data=file_data, content_type=file_type)
request_obj = request_objects.file_base64_obj(file=file_obj)
resp = client.process_files_base64(request_object=request_obj)
if not resp.ok:
    logging.error(f"response for file {file_name} returned with {resp.status_code}")

# Write to file
with open(os.path.join(file_dir,f"redacted-{file_name}"), 'wb') as redacted_file:
    processed_file = resp.processed_file.encode("ascii")
    processed_file = base64.b64decode(processed_file, validate=True)
    redacted_file.write(processed_file)

Bleep an audio file

from privateai_client import PAIClient
from privateai_client.objects import request_objects
import base64
import os
import logging

file_dir = "/path/to/your/file"
file_name = 'sample_file.pdf'
filepath = os.path.join(file_dir,file_name)
file_type= "type/of_file" #eg. audio/mp3 or audio/wav
client = PAIClient("http", "localhost", "8080")


file_dir = "/home/adam/workstation/file_processing/test_audio"
file_name = "test_audio.mp3"
filepath = os.path.join(file_dir,file_name)
file_type = "audio/mp3"
with open(filepath, "rb") as b64_file:
    file_data = base64.b64encode(b64_file.read())
    file_data = file_data.decode("ascii")

file_obj = request_objects.file_obj(data=file_data, content_type=file_type)
timestamp = request_objects.timestamp_obj(start=1.12, end=2.14)
request_obj = request_objects.bleep_obj(file=file_obj, timestamps=[timestamp])

resp = client.bleep(request_object=request_obj)
if not resp.ok:
    logging.error(f"response for file {file_name} returned with {resp.status_code}")
with open(os.path.join(file_dir,f"redacted-{file_name}"), 'wb') as redacted_file:
    processed_file = resp.bleeped_file.encode("ascii")
    processed_file = base64.b64decode(processed_file, validate=True)
    redacted_file.write(processed_file)

Working with structured data

When deidentifying smaller strings of structured data, more accuracte results can be achieved by passing in the whole column as a string (including the header) and a delimiter. For example, making a request row by row for a column named SSN will return data identified as PHONE_NUMBER, even when the header is included

# Working with data frames
import pandas as pd
from privateai_client import PAIClient
from privateai_client.objects import request_objects

client = PAIClient("http", "localhost", "8080")
data_frame = pd.DataFrame(
    {
        "Name": [
            "Braund, Mr. Owen Harris",
            "Allen, Mr. William Henry",
            "Bonnell, Miss. Elizabeth",
        ],
        "Age": [22, 35, 58],
        "Sex": ["male", "male", "female"],
    }
)
print(data_frame)
text_req = request_objects.process_text_obj(text=[])
for column in data_frame.columns:
    text_req.text.append(f"{column}:{' | '.join([str(row) for row in data_frame[column]])}")

resp = client.process_text(text_req)
redacted_data = dict()
for row in resp.processed_text:
    data = row.split(':',1)
    redacted_data[data[0]] = data[1].split(' | ')
redacted_data_frame = pd.DataFrame(redacted_data)
print(redacted_data_frame)

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

3.8.1

Apr 16, 2024

3.8.0

Apr 11, 2024

3.7.2

Mar 4, 2024

3.7.1

Feb 1, 2024

3.7.0

Feb 1, 2024

3.6.3

Jan 18, 2024

3.6.2

Jan 15, 2024

3.6.1

Jan 12, 2024

3.6.0

Dec 22, 2023

3.5.0

Nov 14, 2023

1.3.3

Nov 14, 2023

1.3.2

Sep 11, 2023

1.3.1

Aug 8, 2023

1.3.0

Aug 2, 2023

1.3.0rc1 pre-release

Aug 1, 2023

1.2.0

Jun 1, 2023

1.1.0

May 16, 2023

This version

1.0.5

Apr 19, 2023

1.0.4

Apr 19, 2023

1.0.3

Apr 19, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

privateai_client-1.0.5.tar.gz (13.2 kB view hashes)

Uploaded Apr 19, 2023 Source

Built Distribution

privateai_client-1.0.5-py3-none-any.whl (11.7 kB view hashes)

Uploaded Apr 19, 2023 Python 3

Hashes for privateai_client-1.0.5.tar.gz

Hashes for privateai_client-1.0.5.tar.gz
Algorithm	Hash digest
SHA256	`4c9b6a8ef439fc686ba9b62730369507816b36f3ad2f03dead05e80e4cbe3c72`
MD5	`0689cca0b55905bcbf69fa942f9c6736`
BLAKE2b-256	`4998da69aac3c83bf74e39403989c602c557bd4ce504f03e661526b5585b6df5`

Hashes for privateai_client-1.0.5-py3-none-any.whl

Hashes for privateai_client-1.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8ba6057b8aa9e1b325713e4ab42b51dd199b59de1d202957d34cffb5875d4fa5`
MD5	`07940b8ac9700fd92649ee87285843b1`
BLAKE2b-256	`2aad9cc369ce0dbe882add5cbca0ef2ab6dbb51186a578a1e42e47bee4c36e73`