Skip to main content

A Python package for translating large texts with advanced features including OCR support.

Project description

OctoLingo

PyPI Python License Issues Stars Downloads Downloads/week

OctoLingo is a powerful and versatile Python package designed to simplify text translation and language processing tasks. Built with developers in mind, OctoLingo provides a seamless interface for translating text, detecting languages, and handling large-scale translation tasks efficiently. With support for text, documents, and image translation through OCR, OctoLingo provides enterprise-grade translation features in an easy-to-use package. Whether you're building a multilingual application, analyzing global content, or automating translation workflows, OctoLingo has you covered.

Key Features

🌍 Multi-Format Translation

  • Text Translation: Translate between 100+ languages with confidence scoring
  • Document Processing: Handle TXT, DOCX, and PDF files natively
  • Image OCR: Extract and translate text from images (JPG, PNG, TIFF, BMP)
  • Byte Stream Processing: Translate directly from file bytes (ideal for web apps)

🚀 Efficient Large-Text Handling with unlimited character support

  • Split large texts into manageable chunks to overcome API limitations.
  • Translate large documents or datasets without hassle.
  • Batch translation for large-scale projects.

🔍 Language Intelligence

  • Auto-Detection: Identify source languages with confidence scores
  • Multi-Language OCR: Extract text from documents with mixed languages
  • Language Validation: Verify supported languages before translation

Asynchronous Translation

  • Non-blocking translations for high-performance applications

📚 Custom Glossaries

  • Define custom terms and their translations for domain-specific use cases.
  • Ensure consistent translations for specialized vocabulary.

📜 Translation History

  • Log and retrieve translation history for auditing and analysis.

🛠️ Enterprise Features

  • File Handler: Robust file operations with encoding fallback support
  • Caching: Intelligent caching for repeated translations

🛠️ Developer-Friendly

  • Easy-to-use API with comprehensive documentation.
  • Modular design for seamless integration into existing projects.

Installation

Install OctoLingo via pip:

For Windows users, you should install with:

pip install OctoLingo[windows]

For Linux/Mac users:

pip install OctoLingo

Usage

Language Detection

from OctoLingo.translator import OctoLingo

octo = OctoLingo()

text = "Ceci est un texte en français"
lang = octo.detect_language(text)

print(f"Detected: {lang}")  # "fr"

Language Validation

from OctoLingo.translator import OctoLingo

translator = OctoLingo()
print(translator.validate_language('es'))  # Should return True
try:
    print(translator.validate_language('xx'))  # Should raise TranslationError
except Exception as e:
    print(e)

Translating Text

from OctoLingo.translator import OctoLingo

octo = OctoLingo()
text = "Hello, how are you today?"
translated, confidence = octo.translate(text, "es")

print(f"Translated: {translated}")  # "Hola, ¿cómo estás hoy?"
print(f"Confidence: {confidence}")  # 1.00

Chunked Large Text Translation

from OctoLingo.translator import OctoLingo

octo = OctoLingo()

large_text = "..." # unlimited character text
translated, confidence = octo.translate(large_text, "zh-CN")  # Automatically chunks

print(translated)
print(f"Translated {len(translated)} characters")

File Translation (Text File)

from OctoLingo.translator import OctoLingo

octo = OctoLingo()

# test.txt contains "This is a sample text file"
translated, confidence = octo.translate_file("test.txt", "de")

print(translated)  # "Dies ist eine Beispieltextdatei"

File Translation (Word Document)

from OctoLingo.translator import OctoLingo

octo = OctoLingo()

# report.docx contains business report in English
translated, confidence = octo.translate_file("report.docx", "ja")

print(translated)  # Japanese translation of the document

File Translation (PDF)

from OctoLingo.translator import OctoLingo

octo = OctoLingo()

# manual.pdf contains technical documentation
translated, confidence = octo.translate_file("manual.pdf", "fr")

print(translated)  # French translation

Image Translation (OCR)

from OctoLingo.translator import OctoLingo

octo = OctoLingo()

# sign.png contains text "Emergency Exit"
translated, confidence = octo.translate_file("sign.png", "ar")

print(translated)  # Arabic translation

Multi-language OCR Setup (documents with multiple languages)

from OctoLingo.translator import OctoLingo
from OctoLingo.ocr import OctoOCR

octo = OctoLingo()

octo.ocr = OctoOCR(languages=['en', 'fr', 'es'])  # Replace default OCR
translated = octo.translate_file("multilingual.pdf", "de")

Byte Stream Translation

from OctoLingo import OctoLingo

octo = OctoLingo()

with open("contract.docx", "rb") as f:
    file_bytes = f.read()

translated, confidence = octo.translate_file_from_bytes(file_bytes, "word", "ru")
print(translated)  # Russian translation
# For other documents of like pdf and images, use the key word "pdf", or "image"

Batch Translation

from OctoLingo import OctoLingo

texts = [
    "Good morning",
    "Please send the report",
    "Meeting at 3 PM"
]

octo = OctoLingo()
results = octo.translate_batch(texts, "fr")

for original, (translated, confidence) in zip(texts, results):
    print(f"{original}{translated}")
    # "Good morning → Bonjour"
    # "Please send the report → Veuillez envoyer le rapport"
    # "Meeting at 3 PM → Réunion à 15 h"

Asynchronous Translation

import asyncio
from OctoLingo.translator import OctoLingo

octo = OctoLingo()

async def translate_async():
    result = await octo.translate_async("We need more time", "it")
    print(result)  # ("Abbiamo bisogno di più tempo", 1.00)

asyncio.run(translate_async())

Custom Glossaries

from OctoLingo.glossary import Glossary
from OctoLingo.translator import OctoLingo

glossary = Glossary()
octo = OctoLingo()
glossary.add_term("Hello", "Holla")
glossary_result = glossary.apply_glossary("Hello is a greeting word for english language.")

result = octo.translate(glossary_result, 'es')

print(result)  # Should print "Holla es una palabra de saludo para el idioma inglés."

File Handling

from OctoLingo.translator import OctoLingo
from OctoLingo.file_handler import FileHandler

# Write test content to a file
FileHandler.write_file('input.txt', "Hello, world!")

# Translate the file content
translator = OctoLingo()
text = FileHandler.read_file('input.txt')
translated_text, _ = translator.translate(text, 'es')
FileHandler.write_file('output.txt', translated_text)

# Read and print the translated content
print(FileHandler.read_file('output.txt'))  # Should print the translated text

Translation History

from OctoLingo.history import TranslationHistory

history = TranslationHistory()
history.log_translation("Hello", "Hola", "en", "es")
print(history.get_history())  # Should print the logged translation

Contributing

  • OctoLingo is an open-source project, and contributions are welcome! If you'd like to contribute, please check out my GitHub repository for guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

octolingo-0.3.0.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

octolingo-0.3.0-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file octolingo-0.3.0.tar.gz.

File metadata

  • Download URL: octolingo-0.3.0.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for octolingo-0.3.0.tar.gz
Algorithm Hash digest
SHA256 de4b1f5c11708acaa9e4a29458084314ec1ac2aba7b156cea7c6b53e28517e7c
MD5 8154bec32a08e12e5b67c1a7f1b687f9
BLAKE2b-256 db459684fe550d38d60ec20d05e8dfb4b650e9010b0836e0e0dee2e514446817

See more details on using hashes here.

File details

Details for the file octolingo-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: octolingo-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for octolingo-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 badd7d30e7bf953ab7638e891e6618c8e5640cb276ddea6cd8cce9f2525f078a
MD5 c68f365374382fd66f8ffbf5a1d7c450
BLAKE2b-256 1bfd026798ccc8ae0e2cd52322f8cd2cb356005f83e569a0acf816a3350a1fcb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page