Skip to main content

A scalable, domain-agnostic platform for automated translation and standardization of structured text in scientific collaboration

Project description

LangLint

Breaking Language Barriers in Research Collaboration 🚀 | As Simple as Ruff, Integrate into Your CI/CD Pipeline

PyPI Python License

LangLint is an extensible automated translation and standardization platform designed to eliminate language barriers in structured text for research collaboration.

🚀 Quick Start

# Install
pip install langlint

# Scan translatable content
langlint scan src/

# Translate (preserve original files)
langlint translate src/ -t google -l en -o output/

# In-place translation (auto backup)
langlint fix src/ -t google -l en

Core Commands

Command Function Example
scan Scan translatable content langlint scan .
translate Translate to new directory langlint translate . -t google -l en -o output/
fix In-place translate + backup langlint fix . -t google -l en

Default: Google Translate (Free, no API Key required)

Other Translators (OpenAI, DeepL, Azure)
  • openai - OpenAI GPT (requires OPENAI_API_KEY)
  • deepl - DeepL (requires DEEPL_API_KEY)
  • azure - Azure Translator (requires AZURE_API_KEY)

✨ Key Features

🌍 Multilingual Translation Support

  • 100+ Language Pairs: French↔English, German↔Chinese, Spanish↔Japanese, etc.
  • Smart Language Detection: Auto-detect source language or specify manually
  • Syntax Protection: Automatically excludes string literals and f-strings
  • High-Performance Concurrency: Batch translation for multiple files
# Basic usage (Chinese → English)
langlint fix src/ -t google -s zh-CN -l en

# European languages (French → English, must specify -s)
langlint fix french_code.py -t google -s fr -l en

# Cross-language families (German → Chinese)
langlint fix german_code.py -t google -s de -l zh-CN
📋 Supported Languages List

European Languages: English (en), French (fr), German (de), Spanish (es), Italian (it), Portuguese (pt), Russian (ru), Dutch (nl), Polish (pl), Swedish (sv)

Asian Languages: Simplified Chinese (zh-CN), Traditional Chinese (zh-TW), Japanese (ja), Korean (ko), Thai (th), Vietnamese (vi), Hindi (hi), Indonesian (id)

Other Languages: Arabic (ar), Hebrew (he), Turkish (tr), Greek (el), Persian (fa)

Note: European languages (French, German, Spanish, Italian, etc.) must use the -s parameter to specify source language, otherwise they will be misidentified as English!

🔌 Supported File Types

Python • Markdown • Jupyter Notebook • JavaScript/TypeScript • Go • Rust • Java • C/C++ • Config files (YAML/TOML/JSON) • 20+ types

⚡ High Performance

Concurrent processing is 10-20x faster than serial 🚀

📖 Detailed Usage Guide (Click to expand)

Basic Commands

# Scan translatable content
langlint scan path/to/files

# Translate to new directory
langlint translate path/to/files -t google -s zh-CN -l en -o output/

# In-place translation (auto backup)
langlint fix path/to/files -t google -s zh-CN -l en

Multilingual Translation Scenarios

# Scenario 1: Translate French project to English
langlint scan french_project/ -o report.json --format json
langlint translate french_project/ -t google -s fr -l en -o english_project/

# Scenario 2: Generate multilingual documentation
langlint translate docs/ -t google -s en -l zh-CN -o docs_zh/
langlint translate docs/ -t google -s en -l ja -o docs_ja/
langlint translate docs/ -t google -s en -l fr -o docs_fr/

# Scenario 3: Internationalize codebase
langlint fix src/ -t google -s zh-CN -l en
pytest tests/  # Verify functionality

Advanced Parameters

# Exclude specific files
langlint translate src/ -t google -s zh-CN -l en -o output/ -e "**/test_*" -e "**/__pycache__/"

# Dry-run preview
langlint translate src/ -t google -s fr -l en --dry-run

# Use other translators
langlint translate src/ -t openai -s zh-CN -l en  # Requires OPENAI_API_KEY
langlint translate src/ -t deepl -s zh -l en-US   # Requires DEEPL_API_KEY
🔧 Low-Level API Usage (Click to expand)

LangLint can be used as a Python library in your projects.

Basic API Usage

import asyncio
from langlint.core.client import Dispatcher
from langlint.translators.google_translator import GoogleTranslator, GoogleConfig
from langlint.core.types import TranslatableUnit, UnitType
from pathlib import Path

async def translate_file_example():
    """Example of translating a single file"""
    
    # 1. Create translator
    config = GoogleConfig(
        delay_range=(0.3, 0.6),  # Delay 0.3-0.6s per request to avoid rate limits
        timeout=30,
        retry_count=3
    )
    translator = GoogleTranslator(config)
    
    # 2. Create dispatcher
    dispatcher = Dispatcher()
    
    # 3. Parse file
    file_path = Path("example.py")
    result = await dispatcher.parse_file(str(file_path))
    
    if result.success:
        # 4. Translate extracted units
        source_lang = "fr"  # French
        target_lang = "en"  # English
        
        texts = [unit.content for unit in result.units]
        translation_results = await translator.translate_batch(
            texts, 
            source_lang, 
            target_lang
        )
        
        # 5. Create translated units
        translated_units = []
        for unit, trans_result in zip(result.units, translation_results):
            translated_unit = TranslatableUnit(
                content=trans_result.translated_text,
                unit_type=unit.unit_type,
                line_number=unit.line_number,
                column_number=unit.column_number,
                context=unit.context
            )
            translated_units.append(translated_unit)
        
        # 6. Reconstruct file
        original_content = file_path.read_text(encoding='utf-8')
        reconstructed = result.parser.reconstruct_file(
            original_content, 
            translated_units, 
            str(file_path)
        )
        
        # 7. Write output
        output_path = Path("example_translated.py")
        output_path.write_text(reconstructed, encoding='utf-8')
        
        print(f"Translation completed: {output_path}")

# Run example
asyncio.run(translate_file_example())

Batch Translate Multiple Files

import asyncio
from pathlib import Path
from langlint.core.client import Dispatcher
from langlint.translators.google_translator import GoogleTranslator, GoogleConfig

async def batch_translate_project(
    source_dir: str, 
    output_dir: str, 
    source_lang: str = "zh-CN",
    target_lang: str = "en"
):
    """Batch translate project files"""
    
    translator = GoogleTranslator(GoogleConfig())
    dispatcher = Dispatcher()
    
    source_path = Path(source_dir)
    output_path = Path(output_dir)
    output_path.mkdir(parents=True, exist_ok=True)
    
    # Get all Python files
    py_files = list(source_path.rglob("*.py"))
    
    print(f"Found {len(py_files)} Python files")
    
    for file_path in py_files:
        try:
            print(f"Translating: {file_path}")
            
            # Parse file
            result = await dispatcher.parse_file(str(file_path))
            
            if not result.success or not result.units:
                print(f"  Skipped (no translatable content)")
                continue
            
            # Translate
            texts = [unit.content for unit in result.units]
            translations = await translator.translate_batch(
                texts, source_lang, target_lang
            )
            
            # Reconstruct
            translated_units = [
                unit._replace(content=trans.translated_text)
                for unit, trans in zip(result.units, translations)
            ]
            
            original = file_path.read_text(encoding='utf-8')
            reconstructed = result.parser.reconstruct_file(
                original, translated_units, str(file_path)
            )
            
            # Save
            relative = file_path.relative_to(source_path)
            out_file = output_path / relative
            out_file.parent.mkdir(parents=True, exist_ok=True)
            out_file.write_text(reconstructed, encoding='utf-8')
            
            print(f"  ✓ Completed")
            
        except Exception as e:
            print(f"  ✗ Error: {e}")

# Usage example
asyncio.run(batch_translate_project(
    "src/",           # Source directory
    "src_en/",        # Output directory
    "fr",             # French
    "en"              # English
))

Custom Translator

from langlint.translators.base import Translator, TranslationResult, TranslationStatus
from typing import List

class CustomTranslator(Translator):
    """Custom translator example"""
    
    def __init__(self, api_key: str):
        super().__init__(name="custom")
        self.api_key = api_key
    
    async def translate(
        self, 
        text: str, 
        source_language: str, 
        target_language: str
    ) -> TranslationResult:
        """Single text translation"""
        # Implement your translation logic
        translated = await self._call_your_api(text, source_language, target_language)
        
        return TranslationResult(
            original_text=text,
            translated_text=translated,
            source_language=source_language,
            target_language=target_language,
            status=TranslationStatus.SUCCESS,
            confidence=0.9,
            metadata={"translator": "custom"}
        )
    
    async def translate_batch(
        self, 
        texts: List[str], 
        source_language: str, 
        target_language: str
    ) -> List[TranslationResult]:
        """Batch translation"""
        # Use concurrency for efficiency
        import asyncio
        tasks = [
            self.translate(text, source_language, target_language) 
            for text in texts
        ]
        return await asyncio.gather(*tasks)
    
    async def _call_your_api(self, text, source, target):
        """Call your translation API"""
        # Implement API call logic
        pass

🎯 Best Practices

1. Performance Optimization

# ✅ Recommended: Use batch translation
texts = ["text1", "text2", "text3"]
results = await translator.translate_batch(texts, "zh-CN", "en")

# ❌ Avoid: Translate one by one (slow)
for text in texts:
    result = await translator.translate(text, "zh-CN", "en")

2. Error Handling

try:
    result = await translator.translate(text, source_lang, target_lang)
    if result.status == TranslationStatus.SUCCESS:
        print(f"Translation succeeded: {result.translated_text}")
    else:
        print(f"Translation failed: {result.metadata.get('error')}")
except Exception as e:
    print(f"Exception: {e}")

3. Rate Limit Management

# Google Translate limit: ~5 requests/sec
config = GoogleConfig(
    delay_range=(0.3, 0.6),  # Delay per request to avoid limits
    retry_count=3,            # Retry attempts on failure
    timeout=30                # Timeout duration
)
translator = GoogleTranslator(config)

4. Concurrency Control

import asyncio

# Use Semaphore to control concurrency
sem = asyncio.Semaphore(5)  # Max 5 concurrent requests

async def translate_with_limit(text):
    async with sem:
        return await translator.translate(text, "fr", "en")

tasks = [translate_with_limit(t) for t in texts]
results = await asyncio.gather(*tasks)

5. Language Code Standards

# ✅ Recommended: Use standard language codes
"zh-CN"  # Simplified Chinese
"zh-TW"  # Traditional Chinese
"en"     # English
"fr"     # French
"de"     # German
"es"     # Spanish
"ja"     # Japanese
"ko"     # Korean

# ❌ Avoid: Non-standard codes
"zh"     # Will be auto-converted to zh-CN, but better to specify
"chinese" # Not supported
⚙️ Configuration File (Click to expand)

Configure in pyproject.toml:

[tool.langlint]
translator = "google"
target_lang = "en"
source_lang = ["zh-CN", "ja", "ko"]
exclude = ["**/test_*", "**/data/"]

# Path-specific settings
[tool.langlint."docs/**/*.md"]
translator = "deepl"

🤖 CI/CD Integration

Integrate into Your Workflow Like Ruff - Automate multilingual code checking and translation!

Supports: GitHub Actions ✅ | GitLab CI ✅ | Azure Pipelines ✅ | Pre-commit Hooks ✅ | Docker ✅

📋 View Complete CI/CD Integration Configuration (Click to expand)

Integrate LangLint into your CI/CD pipeline to automate multilingual code checking and translation, just as simple as using Ruff for code quality checks!

GitHub Actions Integration ⭐ Recommended

1️⃣ Automatic Translation Coverage Check

Add to .github/workflows/langlint-check.yml:

name: LangLint Check

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main, develop]

jobs:
  langlint-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'
      
      - name: Install LangLint
        run: |
          pip install langlint
      
      - name: Scan for translatable content
        run: |
          langlint scan . -o report.json --format json
          
      - name: Check translation requirements
        run: |
          # Check for translatable content
          if [ -s report.json ]; then
            echo "⚠️ Found translatable content. Run 'langlint translate' locally."
            cat report.json
          else
            echo "✅ No translatable content found."
          fi

2️⃣ Auto-Translate and Create PR

Automatically translate Chinese code to English and create a Pull Request:

name: Auto Translate

on:
  workflow_dispatch:  # Manual trigger
  schedule:
    - cron: '0 0 * * 0'  # Run every Sunday

jobs:
  translate:
    runs-on: ubuntu-latest
    permissions:
      contents: write
      pull-requests: write
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'
      
      - name: Install LangLint
        run: pip install langlint
      
      - name: Translate code
        run: |
          langlint translate src/ -t google -s zh-CN -l en -o src_en/
      
      - name: Create Pull Request
        uses: peter-evans/create-pull-request@v5
        with:
          token: ${{ secrets.GITHUB_TOKEN }}
          commit-message: 'chore: auto translate to English'
          title: '🌐 Auto-translated code to English'
          body: |
            This PR contains auto-translated code from Chinese to English.
            
            **Translation Details:**
            - Source Language: Chinese (zh-CN)
            - Target Language: English (en)
            - Translator: Google Translate
            
            Please review carefully before merging.
          branch: auto-translate/en
          delete-branch: true

3️⃣ Pre-commit Integration Check

Block commits containing untranslated Chinese comments:

name: Pre-commit Check

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  check-translation:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'
      
      - name: Install LangLint
        run: pip install langlint
      
      - name: Check for non-English content
        run: |
          # Scan for Chinese content
          langlint scan . -o report.json --format json
          
          # Fail if Chinese content found
          if grep -q '"zh-CN"' report.json; then
            echo "❌ Found Chinese content. Please translate before committing."
            echo "Run: langlint fix . -t google -s zh-CN -l en"
            exit 1
          fi
          
          echo "✅ All content is in English."

4️⃣ Multilingual Documentation Auto-Publish

Automatically translate documentation to multiple languages:

name: Translate Docs

on:
  push:
    branches: [main]
    paths:
      - 'docs/**'

jobs:
  translate-docs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'
      
      - name: Install LangLint
        run: pip install langlint
      
      - name: Translate to multiple languages
        run: |
          # Translate to Chinese
          langlint translate docs/ -t google -s en -l zh-CN -o docs_zh/
          
          # Translate to Japanese
          langlint translate docs/ -t google -s en -l ja -o docs_ja/
          
          # Translate to French
          langlint translate docs/ -t google -s en -l fr -o docs_fr/
          
          # Translate to Spanish
          langlint translate docs/ -t google -s en -l es -o docs_es/
      
      - name: Deploy to GitHub Pages
        uses: peaceiris/actions-gh-pages@v3
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          publish_dir: ./
          keep_files: true

Pre-commit Hooks Integration

Like Ruff, add LangLint to your pre-commit configuration.

Install pre-commit

pip install pre-commit

Configure .pre-commit-config.yaml

repos:
  # LangLint - Check translatable content
  - repo: local
    hooks:
      - id: langlint-scan
        name: LangLint Scan
        entry: langlint scan
        language: system
        types: [python, markdown]
        pass_filenames: true
        verbose: true
      
      # Optional: Auto-translate (use with caution)
      - id: langlint-fix
        name: LangLint Auto-fix
        entry: langlint fix
        args: [-t, google, -s, zh-CN, -l, en]
        language: system
        types: [python]
        pass_filenames: true
        stages: [manual]  # Manual trigger only
  
  # Ruff - Code checking (for comparison)
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.1.0
    hooks:
      - id: ruff
        args: [--fix, --exit-non-zero-on-fix]

Use pre-commit

# Install hooks
pre-commit install

# Auto-run on each commit
git commit -m "feat: add new feature"

# Manually run all hooks
pre-commit run --all-files

# Manually trigger translation
pre-commit run langlint-fix --all-files

GitLab CI Integration

Add to .gitlab-ci.yml:

stages:
  - lint
  - translate

langlint-check:
  stage: lint
  image: python:3.11
  script:
    - pip install langlint
    - langlint scan . -o report.json --format json
    - |
      if [ -s report.json ]; then
        echo "⚠️ Found translatable content"
        cat report.json
      fi
  artifacts:
    paths:
      - report.json
    expire_in: 1 week

langlint-translate:
  stage: translate
  image: python:3.11
  only:
    - main
  script:
    - pip install langlint
    - langlint translate src/ -t google -s zh-CN -l en -o src_en/
  artifacts:
    paths:
      - src_en/
    expire_in: 1 month

Azure Pipelines Integration

Add to azure-pipelines.yml:

trigger:
  - main
  - develop

pool:
  vmImage: 'ubuntu-latest'

steps:
- task: UsePythonVersion@0
  inputs:
    versionSpec: '3.11'
  displayName: 'Use Python 3.11'

- script: |
    pip install langlint
  displayName: 'Install LangLint'

- script: |
    langlint scan . -o $(Build.ArtifactStagingDirectory)/report.json --format json
  displayName: 'Scan translatable content'

- task: PublishBuildArtifacts@1
  inputs:
    pathToPublish: '$(Build.ArtifactStagingDirectory)'
    artifactName: 'langlint-report'

Docker Integration

Dockerfile Example

FROM python:3.11-slim

WORKDIR /app

# Install LangLint
RUN pip install --no-cache-dir langlint

# Copy source code
COPY . .

# Run translation
CMD ["langlint", "translate", ".", "-t", "google", "-s", "zh-CN", "-l", "en", "-o", "output/"]

Use Docker Compose

version: '3.8'

services:
  langlint:
    image: python:3.11-slim
    volumes:
      - .:/app
    working_dir: /app
    command: >
      sh -c "
        pip install langlint &&
        langlint translate src/ -t google -s zh-CN -l en -o src_en/
      "

VS Code Integration (Coming Soon)

Upcoming VS Code extension will provide:

  • ✅ Real-time translation suggestions
  • ✅ Right-click menu translation
  • ✅ Auto-translate on save
  • ✅ Translation status indicator

Best Practices

1️⃣ Phased Integration

# Phase 1: Scan only, don't block CI
langlint scan . -o report.json --format json

# Phase 2: Generate warnings
if grep -q '"zh-CN"' report.json; then
  echo "⚠️ Warning: Found translatable content"
fi

# Phase 3: Block commits (strict mode)
if grep -q '"zh-CN"' report.json; then
  echo "❌ Error: Must translate before merging"
  exit 1
fi

2️⃣ Use with Ruff

# First, check code quality with Ruff
ruff check . --fix

# Then, translate with LangLint
langlint fix . -t google -s zh-CN -l en

# Finally, run Ruff again to ensure translated code meets standards
ruff check .

3️⃣ Translate Only New Content

# Get changed files
git diff --name-only origin/main... > changed_files.txt

# Translate only changed files
cat changed_files.txt | xargs langlint fix -t google -s zh-CN -l en

4️⃣ Cache Optimization

# Enable cache in GitHub Actions
- name: Cache LangLint
  uses: actions/cache@v3
  with:
    path: ~/.cache/langlint
    key: ${{ runner.os }}-langlint-${{ hashFiles('**/*.py') }}
    restore-keys: |
      ${{ runner.os }}-langlint-

Enterprise Deployment

Self-hosted Runner

jobs:
  translate:
    runs-on: [self-hosted, linux, x64]
    steps:
      - name: Translate with enterprise translator
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          langlint translate src/ -t openai -s zh-CN -l en -o src_en/

Secrets Management

# Configure in GitHub Secrets
# Settings > Secrets and variables > Actions > New repository secret

env:
  OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
  DEEPL_API_KEY: ${{ secrets.DEEPL_API_KEY }}

Through CI/CD integration, LangLint can become an indispensable part of your development workflow, just like Ruff, automating multilingual code translation and improving team collaboration efficiency!

🤝 Contributing

Contributions welcome! See the Contributing Guide.

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

📞 Contact


⭐ If you find this useful, please give the project a Star!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langlint-0.0.1.tar.gz (73.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langlint-0.0.1-py3-none-any.whl (69.8 kB view details)

Uploaded Python 3

File details

Details for the file langlint-0.0.1.tar.gz.

File metadata

  • Download URL: langlint-0.0.1.tar.gz
  • Upload date:
  • Size: 73.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.1

File hashes

Hashes for langlint-0.0.1.tar.gz
Algorithm Hash digest
SHA256 4275a17ac6b112d4890767a20cd818b2a4b44cac714065e6176a2ac3d174a2ad
MD5 0290b4f9b010760f00d8e4d8f27a379f
BLAKE2b-256 7a2e36418a3cb3b9892227a4f9b79019ffca8968e0f895ff20a0bb46d6a90a0a

See more details on using hashes here.

File details

Details for the file langlint-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: langlint-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 69.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.1

File hashes

Hashes for langlint-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e4a2bb4a5a392275c130c1343a184a34fa4d0180fd711a07598f3063558f099a
MD5 e260c54ccc29637bcd8f681f335ba815
BLAKE2b-256 00280bc23a28adb509b98279505064923dd1ac32761ce85af44de2640f6a87b7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page