Microsoft Azure Document Translation Client Library for Python
Project description
Azure Document Translation client library for Python
Azure Cognitive Services Document Translation is a cloud service that translates documents to and from 90 languages and dialects while preserving document structure and data format. Use the client library for Document Translation to:
- Translate numerous, large files from an Azure Blob Storage container to a target container in your language of choice.
- Check the translation status and progress of each document in the translation job.
- Apply a custom translation model or glossaries to tailor translation to your specific case.
Source code | Package (PyPI) | API reference documentation | Product documentation | Samples
Getting started
Prerequisites
- Python 2.7, or 3.6 or later is required to use this package.
- You must have an Azure subscription and a Document Translation resource to use this package.
Install the package
Install the Azure Document Translation client library for Python with pip:
pip install azure-ai-translation-document --pre
Note: This version of the client library defaults to the v1.0-preview.1 version of the service
Create a Document Translation resource
Document Translation supports single-service access only. To access the service, create a Translator resource.
You can create the resource using
Option 1: Azure Portal
Option 2: Azure CLI. Below is an example of how you can create a Document Translation resource using the CLI:
# Create a new resource group to hold the document translation resource -
# if using an existing resource group, skip this step
az group create --name my-resource-group --location westus2
# Create document translation
az cognitiveservices account create \
--name document-translation-resource \
--custom-domain document-translation-resource \
--resource-group my-resource-group \
--kind TextTranslation \
--sku S1 \
--location westus2 \
--yes
Authenticate the client
In order to interact with the Document Translation service, you will need to create an instance of a client. An endpoint and credential are necessary to instantiate the client object.
Looking up the endpoint
You can find the endpoint for your Document Translation resource using the Azure Portal.
Note that the service requires a custom domain endpoint. Follow the instructions in the above link to format your endpoint: https://{NAME-OF-YOUR-RESOURCE}.cognitiveservices.azure.com/
Get the API key
The API key can be found in the Azure Portal or by running the following Azure CLI command:
az cognitiveservices account keys list --name "resource-name" --resource-group "resource-group-name"
Create the client with AzureKeyCredential
To use an API key as the credential
parameter,
pass the key as a string into an instance of AzureKeyCredential.
from azure.core.credentials import AzureKeyCredential
from azure.ai.translation.document import DocumentTranslationClient
endpoint = "https://<resource-name>.cognitiveservices.azure.com/"
credential = AzureKeyCredential("<api_key>")
document_translation_client = DocumentTranslationClient(endpoint, credential)
Key concepts
The Document Translation service requires that you upload your files to an Azure Blob Storage source container and provide a target container where the translated documents can be written. SAS tokens to the containers (or files) are used to access the documents and create the translated documents in the target container. Additional information about setting this up can be found in the service documentation:
- Set up Azure Blob Storage containers with your documents
- Optionally apply glossaries or a custom model for translation
- Generate SAS tokens to your containers (or files) with the appropriate permissions
DocumentTranslationClient
Interaction with the Document Translation client library begins with an instance of the DocumentTranslationClient
.
The client provides operations for:
- Creating a translation job to translate documents in your source container(s) and write results to you target container(s).
- Checking the status of individual documents in the translation job and monitoring each document's progress.
- Enumerating all past and current translation jobs with the option to wait until the job(s) finish.
- Identifying supported glossary and document formats.
Translation Input
To create a translation job, pass a list of DocumentTranslationInput
into the create_translation_job
client method.
Constructing a DocumentTranslationInput
requires that you pass the SAS URLs to your source and target containers (or files)
and the target language(s) for translation.
A single source container with documents can be translated to many different languages:
from azure.ai.translation.document import DocumentTranslationInput, TranslationTarget
my_input = [
DocumentTranslationInput(
source_url="<sas_url_to_source>",
targets=[
TranslationTarget(target_url="<sas_url_to_target_fr>", language_code="fr"),
TranslationTarget(target_url="<sas_url_to_target_de>", language_code="de")
]
)
]
Or multiple different sources can be provided each with their own targets.
from azure.ai.translation.document import DocumentTranslationInput, TranslationTarget
my_input = [
DocumentTranslationInput(
source_url="<sas_url_to_source_A>",
targets=[
TranslationTarget(target_url="<sas_url_to_target_fr>", language_code="fr"),
TranslationTarget(target_url="<sas_url_to_target_de>", language_code="de")
]
),
DocumentTranslationInput(
source_url="<sas_url_to_source_B>",
targets=[
TranslationTarget(target_url="<sas_url_to_target_fr>", language_code="fr"),
TranslationTarget(target_url="<sas_url_to_target_de>", language_code="de")
]
),
DocumentTranslationInput(
source_url="<sas_url_to_source_C>",
targets=[
TranslationTarget(target_url="<sas_url_to_target_fr>", language_code="fr"),
TranslationTarget(target_url="<sas_url_to_target_de>", language_code="de")
]
)
]
Note: the target_url for each target language must be unique.
See the service documentation for all supported languages.
Return value
There are primarily two types of return values when checking on the result of a translation job - JobStatusResult
and DocumentStatusResult
.
- A
JobStatusResult
will contain the details of the entire job, such as it's status, ID, any errors, and status summaries of the documents in the job. - A
DocumentStatusResult
will contain the details of an individual document, such as it's status, translation progress, any errors, and the URLs to the source document and translated document.
Examples
The following section provides several code snippets covering some of the most common Document Translation tasks, including:
Translate your documents
Translate the documents in your source container to the target containers.
from azure.core.credentials import AzureKeyCredential
from azure.ai.translation.document import DocumentTranslationClient, DocumentTranslationInput, TranslationTarget
endpoint = "https://<resource-name>.cognitiveservices.azure.com/"
credential = AzureKeyCredential("<api_key>")
source_container_sas_url_en = "<sas-url-en>"
target_container_sas_url_es = "<sas-url-es>"
target_container_sas_url_fr = "<sas-url-fr>"
document_translation_client = DocumentTranslationClient(endpoint, credential)
job = document_translation_client.create_translation_job(
[
DocumentTranslationInput(
source_url=source_container_sas_url_en,
targets=[
TranslationTarget(target_url=target_container_sas_url_es, language_code="es"),
TranslationTarget(target_url=target_container_sas_url_fr, language_code="fr"),
],
)
]
) # type: JobStatusResult
job_result = document_translation_client.wait_until_done(job.id) # type: JobStatusResult
print("Job created on: {}".format(job_result.created_on))
print("Job last updated on: {}".format(job_result.last_updated_on))
print("Total number of translations on documents: {}".format(job_result.documents_total_count))
print("Of total documents...")
print("{} failed".format(job_result.documents_failed_count))
print("{} succeeded".format(job_result.documents_succeeded_count))
if job_result.status == "Succeeded":
print("Our translation job succeeded")
if job_result.status == "Failed":
print("All documents failed in the translation job")
# check document statuses... see next sample
Check status on individual documents
Check status and translation progress of each document under a job.
from azure.core.credentials import AzureKeyCredential
from azure.ai.translation.document import DocumentTranslationClient
endpoint = "https://<resource-name>.cognitiveservices.azure.com/"
credential = AzureKeyCredential("<api_key>")
job_id = "<job-id>"
document_translation_client = DocumentTranslationClient(endpoint, credential)
documents = document_translation_client.list_all_document_statuses(job_id) # type: ItemPaged[DocumentStatusResult]
for doc in documents:
if doc.status == "Succeeded":
print("Document at {} was translated to {} language".format(
doc.translated_document_url, doc.translate_to
))
if doc.status == "Running":
print("Document ID: {}, translation progress is {} percent".format(
doc.id, doc.translation_progress*100
))
if doc.status == "Failed":
print("Document ID: {}, Error Code: {}, Message: {}".format(
doc.id, doc.error.code, doc.error.message
))
List translation jobs
Enumerate over the translation jobs submitted for the resource.
from azure.core.credentials import AzureKeyCredential
from azure.ai.translation.document import DocumentTranslationClient
endpoint = "https://<resource-name>.cognitiveservices.azure.com/"
credential = AzureKeyCredential("<api_key>")
document_translation_client = DocumentTranslationClient(endpoint, credential)
jobs = document_translation_client.list_submitted_jobs() # type: ItemPaged[JobStatusResult]
for job in jobs:
if not job.has_completed:
job = document_translation_client.wait_until_done(job.id)
print("Job ID: {}".format(job.id))
print("Job status: {}".format(job.status))
print("Job created on: {}".format(job.created_on))
print("Job last updated on: {}".format(job.last_updated_on))
print("Total number of translations on documents: {}".format(job.documents_total_count))
print("Total number of characters charged: {}".format(job.total_characters_charged))
print("Of total documents...")
print("{} failed".format(job.documents_failed_count))
print("{} succeeded".format(job.documents_succeeded_count))
print("{} cancelled".format(job.documents_cancelled_count))
To see how to use the Document Translation client library with Azure Storage Blob to upload documents, create SAS tokens for your containers, and download the finished translated documents, see this sample. Note that you will need to install the azure-storage-blob library to run this sample.
Troubleshooting
General
Document Translation client library will raise exceptions defined in Azure Core.
Logging
This library uses the standard logging library for logging.
Basic information about HTTP sessions (URLs, headers, etc.) is logged at INFO
level.
Detailed DEBUG
level logging, including request/response bodies and unredacted
headers, can be enabled on the client or per-operation with the logging_enable
keyword argument.
See full SDK logging documentation with examples here.
Optional Configuration
Optional keyword arguments can be passed in at the client and per-operation level. The azure-core reference documentation describes available configurations for retries, logging, transport protocols, and more.
Next steps
The following section provides several code snippets illustrating common patterns used in the Document Translation Python client library.
More sample code
These code samples show common scenario operations with the Azure Document Translation client library.
- Client authentication: sample_authentication.py
- Create a translation job: sample_create_translation_job.py
- Check the status of documents: sample_check_document_statuses.py
- List all submitted translation jobs: sample_list_all_submitted_jobs.py
- Apply a custom glossary to translation: sample_translation_with_glossaries.py
- Use Azure Blob Storage to set up translation resources: sample_translation_with_azure_blob.py
Async samples
This library also includes a complete async API supported on Python 3.6+. To use it, you must
first install an async transport, such as aiohttp. Async clients
are found under the azure.ai.translation.document.aio
namespace.
- Client authentication: sample_authentication_async.py
- Create a translation job: sample_create_translation_job_async.py
- Check the status of documents: sample_check_document_statuses_async.py
- List all submitted translation jobs: sample_list_all_submitted_jobs_async.py
- Apply a custom glossary to translation: sample_translation_with_glossaries_async.py
- Use Azure Blob Storage to set up translation resources: sample_translation_with_azure_blob_async.py
Additional documentation
For more extensive documentation on Azure Cognitive Services Document Translation, see the Document Translation documentation on docs.microsoft.com.
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
Release History
1.0.0b1 (2021-04-06)
This is the first beta package of the azure-ai-translation-document client library that targets the Document Translation
service version 1.0-preview.1
. This package's documentation and samples demonstrate the new API.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file azure-ai-translation-document-1.0.0b1.zip
.
File metadata
- Download URL: azure-ai-translation-document-1.0.0b1.zip
- Upload date:
- Size: 107.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6493f52c24e68188345d5877a9dbc6e753982e8b3bd452ed934c74c5de0484e3 |
|
MD5 | dca97ac59ccab3a0678626906d3e2024 |
|
BLAKE2b-256 | f8fab3cca6ba8e55f69179430fbdcb191a9aac59a81c168bc6c30cad39fc1297 |
File details
Details for the file azure_ai_translation_document-1.0.0b1-py2.py3-none-any.whl
.
File metadata
- Download URL: azure_ai_translation_document-1.0.0b1-py2.py3-none-any.whl
- Upload date:
- Size: 52.8 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 83a787f3388deba93ca39d62fc360228d754b364146ff1042c1500dc01171cad |
|
MD5 | 49fc159f18f34424d2bf50bf1515c8c4 |
|
BLAKE2b-256 | b20550ea0d24e6c7338eb0353b72a66c3da316a8c835c0662af841ac606672d2 |