A thin client for communicating with the Private Ai de-identication api.
Project description
paiclient
A client for communicating with the Private Ai de-identication api. This document provides information about how to best use the client. For more information, see Private Ai's API Documentation.
Quick Links
Installation
pip install pai_thin_client
Quick Start
from pai_thin_client import PAIClient
from pai_thin_client import request_objects
schema = "http"
host = "localhost"
port = "8080"
client = PAIClient(schema=schema, host=host, port=port)
text_request = request_objects.process_text(text=["My sample name is John Smith"])
text_request.text
response = client.process_text(text_request)
response.processed_text
Output:
["My sample name is John Smith"]
['My sample name is [NAME_1]']
Working with the Client
Initializing the Client
the PAI client requires a scheme, host and optional port to initialize.
Once created, the connection can be tested with the client's ping
function
from pai_thin_client import PAIClient
scheme = 'http'
host = 'localhost'
port= '8080'
client = PAIClient(scheme, host, port)
client.ping()
Output:
True
Making Requests
Once initialized the client can be used to make any request listed in the Private-Ai documentation
Available requests:
Client Function | Endpoint |
---|---|
get_version |
/ |
get_metrics |
/metrics |
process_text |
/v3/process/text |
process_files_url |
/v3/process/files/uri |
process_files_base64 |
/v3/process/files/base64 |
bleep |
/v3/bleep |
Requests can be made using dictionaries:
sample_text = "This is John Smith's sample dictionary request"
text_dict_request = {"text": sample_text}
response = client.process_text(text_dict_request)
response.processed_text
Output:
["This is [NAME_1]'s sample dictionary request"]
or using built-in request objects:
from pai_thin_client import request_objects
sample_text = "This is John Smith's sample process text object request"
text_request_object = request_objects.process_text_obj(text=[sample_text])
response = client.process_text(text_request_object)
response.processed_text
Output:
["This is [NAME_1]'s sample process text object request"]
Request Objects
Request objects are a simple way of creating request bodies without the tediousness of writing dictionaries. Every post request (as listed in the Private-Ai documentation) has its own request own request object.
from pai_thin_client import request_objects
sample_obj = request_objects.file_url_obj(uri='path/to/file.jpg')
sample_obj.uri
Output:
'path/to/file.jpg'
Additionally there are request objects for each nested dictionary of a request:
from pai_thin_client import request_objects
sample_text = "This is John Smith's sample process text object request where names won't be removed"
# sub-dictionary of entity_detection
sample_entity_type_selector = request_objects.entity_type_selector_obj(type="DISABLE", value=['NAME', 'NAME_GIVEN', 'NAME_FAMILY'])
# sub-dictionary of a process text request
sample_entity_detection = request_objects.entity_detection_obj(entity_types=[sample_entity_type_selector])
# request object created using the sub-dictionaries
sample_request = request_objects.process_text_obj(text=[sample_text], entity_detection=sample_entity_detection)
response = client.process_text(sample_request)
print(response.processed_text)
Output:
["This is John Smith's sample process text object request where names won't be removed"]
Building Request Objects
Request objects can initialized by passing in all the required values needed for the request as arguments or from a dictionary:
# Passing arguments
sample_data = "JVBERi0xLjQKJdPr6eEKMSAwIG9iago8PC9UaXRsZSAoc2FtcGxlKQovUHJvZHVj..."
sample_content_type = "application/pdf"
sample_file_obj = request_objects.file_obj(data=sample_data, content_type=sample_content_type)
# Passing a dictionary
sample_dict = {"data": "JVBERi0xLjQKJdPr6eEKMSAwIG9iago8PC9UaXRsZSAoc2FtcGxlKQovUHJvZHVj...",
"content_type": "application/pdf"}
sample_file_obj2 = request_objects.file_obj.fromdict(sample_dict)
Request objects also can be formatted as dictionaries:
from pai_thin_client import request_objects
sample_text = "Sample text."
# Create the nested request objects
sample_entity_type_selector = request_objects.entity_type_selector_obj(type="DISABLE", value=['HIPAA'])
sample_entity_detection = request_objects.entity_detection_obj(entity_types=[sample_entity_type_selector])
# Create the request object
sample_request = request_objects.process_text_obj(text=[sample_text], entity_detection=sample_entity_detection)
# All nested objects are also formatted
print(sample_request.to_dict())
Output:
{
'text': ['Sample text.'],
'link_batch': False,
'entity_detection': {'accuracy': 'high', 'entity_types': [{'type': 'DISABLE', 'value': ['HIPAA']}], 'filter': [], 'return_entity': True},
'processed_text': {'type': 'MARKER', 'pattern': '[UNIQUE_NUMBERED_ENTITY_TYPE]'}
}
Sample Use
Processing a directory of files
from pai_thin_client import PAIClient
from pai_thin_client.objects import request_objects
import os
import logging
file_dir = "/path/to/file/directory"
client = PAIClient("http", "localhost", "8080")
for file_name in os.listdir(file_dir):
filepath = os.path.join(file_dir, file_name)
if not os.path.isfile(filepath):
continue
req_obj = request_objects.file_url_obj(uri=filepath)
# NOTE this method of file processing requires the container to have an the input and output directories mounted
resp = client.process_files_uri(req_obj)
if not resp.ok:
logging.error(f"response for file {file_name} returned with {resp.status_code}")
Processing a Base64 file
from pai_thin_client import PAIClient
from pai_thin_client.objects import request_objects
import base64
import os
import logging
file_dir = "/path/to/your/file"
file_name = 'sample_file.pdf'
filepath = os.path.join(file_dir,file_name)
file_type= "type/of_file" #eg. application/pdf
client = PAIClient("http", "localhost", "8080")
# Read from file
with open(filepath, "rb") as b64_file:
file_data = base64.b64encode(b64_file.read())
file_data = file_data.decode("ascii")
# Make the request
file_obj = request_objects.file_obj(data=file_data, content_type=file_type)
request_obj = request_objects.file_base64_obj(file=file_obj)
resp = client.process_files_base64(request_object=request_obj)
if not resp.ok:
logging.error(f"response for file {file_name} returned with {resp.status_code}")
# Write to file
with open(os.path.join(file_dir,f"redacted-{file_name}"), 'wb') as redacted_file:
processed_file = resp.processed_file.encode("ascii")
processed_file = base64.b64decode(processed_file, validate=True)
redacted_file.write(processed_file)
Bleep an audio file
from pai_thin_client import PAIClient
from pai_thin_client.objects import request_objects
import base64
import os
import logging
file_dir = "/path/to/your/file"
file_name = 'sample_file.pdf'
filepath = os.path.join(file_dir,file_name)
file_type= "type/of_file" #eg. audio/mp3 or audio/wav
client = PAIClient("http", "localhost", "8080")
file_dir = "/home/adam/workstation/file_processing/test_audio"
file_name = "test_audio.mp3"
filepath = os.path.join(file_dir,file_name)
file_type = "audio/mp3"
with open(filepath, "rb") as b64_file:
file_data = base64.b64encode(b64_file.read())
file_data = file_data.decode("ascii")
file_obj = request_objects.file_obj(data=file_data, content_type=file_type)
timestamp = request_objects.timestamp_obj(start=1.12, end=2.14)
request_obj = request_objects.bleep_obj(file=file_obj, timestamps=[timestamp])
resp = client.bleep(request_object=request_obj)
if not resp.ok:
logging.error(f"response for file {file_name} returned with {resp.status_code}")
with open(os.path.join(file_dir,f"redacted-{file_name}"), 'wb') as redacted_file:
processed_file = resp.bleeped_file.encode("ascii")
processed_file = base64.b64decode(processed_file, validate=True)
redacted_file.write(processed_file)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for privateai_client-1.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52c2d313e35a78024e85e03bc8c83513543e6c0d742ddf1392bdcda8e9ccbaef |
|
MD5 | 09d29d1ed12c126ff5f344acabd41abb |
|
BLAKE2b-256 | beaee3129c506fdc19346e5c671b38ab413c33f36913c3f299948a1a954c47ec |