A Python package for translating large texts with advanced features including OCR support.
Project description
OctoLingo
OctoLingo is a powerful and versatile Python package designed to simplify text translation and language processing tasks. Built with developers in mind, OctoLingo provides a seamless interface for translating text, detecting languages, and handling large-scale translation tasks efficiently. With support for text, documents, and image translation through OCR, OctoLingo provides enterprise-grade translation features in an easy-to-use package. Whether you're building a multilingual application, analyzing global content, or automating translation workflows, OctoLingo has you covered.
Key Features
🌍 Multi-Format Translation
- Text Translation: Translate between 100+ languages with confidence scoring
- Document Processing: Handle TXT, DOCX, and PDF files natively
- Image OCR: Extract and translate text from images (JPG, PNG, TIFF, BMP)
- Byte Stream Processing: Translate directly from file bytes (ideal for web apps)
🚀 Efficient Large-Text Handling with unlimited character support
- Split large texts into manageable chunks to overcome API limitations.
- Translate large documents or datasets without hassle.
- Batch translation for large-scale projects.
🔍 Language Intelligence
- Auto-Detection: Identify source languages with confidence scores
- Multi-Language OCR: Extract text from documents with mixed languages
- Language Validation: Verify supported languages before translation
⚡ Asynchronous Translation
- Non-blocking translations for high-performance applications
📚 Custom Glossaries
- Define custom terms and their translations for domain-specific use cases.
- Ensure consistent translations for specialized vocabulary.
📜 Translation History
- Log and retrieve translation history for auditing and analysis.
🛠️ Enterprise Features
- File Handler: Robust file operations with encoding fallback support
- Caching: Intelligent caching for repeated translations
🛠️ Developer-Friendly
- Easy-to-use API with comprehensive documentation.
- Modular design for seamless integration into existing projects.
Installation
Install OctoLingo via pip:
For Windows users, you should install with:
pip install OctoLingo[windows]
For Linux/Mac users:
pip install OctoLingo
Usage
Language Detection
from OctoLingo.translator import OctoLingo
octo = OctoLingo()
text = "Ceci est un texte en français"
lang = octo.detect_language(text)
print(f"Detected: {lang}") # "fr"
Language Validation
from OctoLingo.translator import OctoLingo
translator = OctoLingo()
print(translator.validate_language('es')) # Should return True
try:
print(translator.validate_language('xx')) # Should raise TranslationError
except Exception as e:
print(e)
Translating Text
from OctoLingo.translator import OctoLingo
octo = OctoLingo()
text = "Hello, how are you today?"
translated, confidence = octo.translate(text, "es")
print(f"Translated: {translated}") # "Hola, ¿cómo estás hoy?"
print(f"Confidence: {confidence}") # 1.00
Chunked Large Text Translation
from OctoLingo.translator import OctoLingo
octo = OctoLingo()
large_text = "..." # unlimited character text
translated, confidence = octo.translate(large_text, "zh-CN") # Automatically chunks
print(translated)
print(f"Translated {len(translated)} characters")
File Translation (Text File)
from OctoLingo.translator import OctoLingo
octo = OctoLingo()
# test.txt contains "This is a sample text file"
translated, confidence = octo.translate_file("test.txt", "de")
print(translated) # "Dies ist eine Beispieltextdatei"
File Translation (Word Document)
from OctoLingo.translator import OctoLingo
octo = OctoLingo()
# report.docx contains business report in English
translated, confidence = octo.translate_file("report.docx", "ja")
print(translated) # Japanese translation of the document
File Translation (PDF)
from OctoLingo.translator import OctoLingo
octo = OctoLingo()
# manual.pdf contains technical documentation
translated, confidence = octo.translate_file("manual.pdf", "fr")
print(translated) # French translation
Image Translation (OCR)
from OctoLingo.translator import OctoLingo
octo = OctoLingo()
# sign.png contains text "Emergency Exit"
translated, confidence = octo.translate_file("sign.png", "ar")
print(translated) # Arabic translation
Multi-language OCR Setup (documents with multiple languages)
from OctoLingo.translator import OctoLingo
from OctoLingo.ocr import OctoOCR
octo = OctoLingo()
octo.ocr = OctoOCR(languages=['en', 'fr', 'es']) # Replace default OCR
translated = octo.translate_file("multilingual.pdf", "de")
Byte Stream Translation
from OctoLingo import OctoLingo
octo = OctoLingo()
with open("contract.docx", "rb") as f:
file_bytes = f.read()
translated, confidence = octo.translate_file_from_bytes(file_bytes, "word", "ru")
print(translated) # Russian translation
# For other documents of like pdf and images, use the key word "pdf", or "image"
Batch Translation
from OctoLingo import OctoLingo
texts = [
"Good morning",
"Please send the report",
"Meeting at 3 PM"
]
octo = OctoLingo()
results = octo.translate_batch(texts, "fr")
for original, (translated, confidence) in zip(texts, results):
print(f"{original} → {translated}")
# "Good morning → Bonjour"
# "Please send the report → Veuillez envoyer le rapport"
# "Meeting at 3 PM → Réunion à 15 h"
Asynchronous Translation
import asyncio
from OctoLingo.translator import OctoLingo
octo = OctoLingo()
async def translate_async():
result = await octo.translate_async("We need more time", "it")
print(result) # ("Abbiamo bisogno di più tempo", 1.00)
asyncio.run(translate_async())
Custom Glossaries
from OctoLingo.glossary import Glossary
from OctoLingo.translator import OctoLingo
glossary = Glossary()
octo = OctoLingo()
glossary.add_term("Hello", "Holla")
glossary_result = glossary.apply_glossary("Hello is a greeting word for english language.")
result = octo.translate(glossary_result, 'es')
print(result) # Should print "Holla es una palabra de saludo para el idioma inglés."
File Handling
from OctoLingo.translator import OctoLingo
from OctoLingo.file_handler import FileHandler
# Write test content to a file
FileHandler.write_file('input.txt', "Hello, world!")
# Translate the file content
translator = OctoLingo()
text = FileHandler.read_file('input.txt')
translated_text, _ = translator.translate(text, 'es')
FileHandler.write_file('output.txt', translated_text)
# Read and print the translated content
print(FileHandler.read_file('output.txt')) # Should print the translated text
Translation History
from OctoLingo.history import TranslationHistory
history = TranslationHistory()
history.log_translation("Hello", "Hola", "en", "es")
print(history.get_history()) # Should print the logged translation
Contributing
- OctoLingo is an open-source project, and contributions are welcome! If you'd like to contribute, please check out my GitHub repository for guidelines.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file octolingo-0.3.0.tar.gz.
File metadata
- Download URL: octolingo-0.3.0.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de4b1f5c11708acaa9e4a29458084314ec1ac2aba7b156cea7c6b53e28517e7c
|
|
| MD5 |
8154bec32a08e12e5b67c1a7f1b687f9
|
|
| BLAKE2b-256 |
db459684fe550d38d60ec20d05e8dfb4b650e9010b0836e0e0dee2e514446817
|
File details
Details for the file octolingo-0.3.0-py3-none-any.whl.
File metadata
- Download URL: octolingo-0.3.0-py3-none-any.whl
- Upload date:
- Size: 10.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
badd7d30e7bf953ab7638e891e6618c8e5640cb276ddea6cd8cce9f2525f078a
|
|
| MD5 |
c68f365374382fd66f8ffbf5a1d7c450
|
|
| BLAKE2b-256 |
1bfd026798ccc8ae0e2cd52322f8cd2cb356005f83e569a0acf816a3350a1fcb
|