Skip to main content

MDx Dict Trans ToolKit

Project description

MDTT - MDx Dict Trans ToolKit

A modern Python 3.13+ tool for packing and unpacking MDict dictionary files (.mdx/.mdd) with advanced features and intuitive CLI interface.

English | 中文

License: MIT Python Code style: ruff

Version 2.0 - Complete Rewrite:

  • 🆕 Modern Subcommand Architecture - Clean CLI interface similar to git and docker
  • 🆕 TOML Metadata Management - User-friendly .meta.toml configuration files with auto-detection
  • 🆕 Rich Information Display - Beautiful formatted output with JSON/TOML export options
  • 🆕 Comprehensive Testing - Full test suite including unit, integration, and real-file testing
  • 🆕 Enhanced Query System - Smart output file naming and custom file specification
  • 🆕 Format Conversion Tools - Built-in converters between text, database, and MDict formats

Key Features

  • Full MDict Support: Read/Write MDict 2.0, Read MDict 3.0, supports encrypted dictionaries
  • Multiple Output Formats: MDX/MDD files, SQLite databases, plain text, split files
  • Intelligent CLI: Context-aware commands with comprehensive help and error handling
  • Metadata System: Automatic .meta.toml file detection and generation
  • Advanced Extraction: Split by alphabet, custom chunk sizes, metadata export
  • Developer Friendly: Modern Python 3.13+, uv package manager, comprehensive type hints

Installation

From PyPI (Recommended)

pip install mdtt

Development Setup

git clone https://github.com/likai/mdtt.git
cd mdtt
uv sync  # Install dependencies with uv (recommended)
# or: pip install -e ".[dev]"  # Alternative with pip

Requirements

  • Python 3.13+ (required for modern typing features)
  • Optional: uv package manager for faster dependency resolution

Quick Start

View Available Commands

mdtt --help

Extract a Dictionary

# Basic extraction (outputs to current directory with .txt and .meta.toml)
mdtt extract my_dict.mdx

# Extract to specific directory
mdtt extract my_dict.mdx -o ./output

# Extract as database
mdtt extract my_dict.mdx --db

# Extract without metadata file
mdtt extract my_dict.mdx --no-meta

Create a Dictionary

  1. Create your content file (my_dict.txt):
apple
A round fruit that grows on trees.
</>
banana
A long curved yellow fruit.
</>
  1. Create metadata file (my_dict.meta.toml):
[dictionary]
title = "My Custom Dictionary"
description = "A simple English dictionary"
  1. Pack the dictionary:
# Auto-detect output filename
mdtt pack -a my_dict.txt

# Or specify explicit output name  
mdtt pack -a my_dict.txt my_dict.mdx

Query and Information

# Query a word (displays result and saves to apple.html)
mdtt query apple my_dict.mdx

# Query with custom output filename
mdtt query apple my_dict.mdx -o definitions/apple_def.html

# Query phrases (automatically creates safe filenames)
mdtt query "can't believe" my_dict.mdx  # Creates can_t_believe.html

# Show dictionary information (rich formatted output)
mdtt info my_dict.mdx

# Export information as JSON or TOML
mdtt info my_dict.mdx --format json
mdtt info my_dict.mdx --format toml

# List dictionary keys with filtering
mdtt keys my_dict.mdx --limit 100
mdtt keys my_dict.mdx --pattern "apple*"

Advanced Usage

Working with TOML Metadata

Create .meta.toml files for automatic metadata detection:

[dictionary]
title = "Oxford Advanced Dictionary"
description = """
Comprehensive English dictionary with detailed definitions.
Perfect for students and professionals.
"""

# 其他属性(encoding, version 等)使用系统默认值
# 如需自定义,可添加 [advanced] 部分

Multiple Input Sources

# Pack multiple files (auto-detect output name)
mdtt pack -a part1.txt -a part2.txt

# Pack with explicit output name
mdtt pack -a part1.txt -a part2.txt combined.mdx

# Use custom metadata
mdtt pack -a source.txt -m custom.meta.toml

# Pack media resources (auto-detects .mdd extension)
mdtt pack -a images_folder/

Format Conversion & Import Tools

# Convert between text and database formats
mdtt convert txt-to-db dict.txt dict.db
mdtt convert db-to-txt dict.db dict.txt

# TBX (Translation Memory) to MDict conversion
# Convert TBX/TMX translation memory files to MDict format
python tests/script_convert_tbx_to_mdict.py input.tbx output.mdx
# Features:
# - Automatic metadata generation from TBX header
# - CSS styling for professional appearance  
# - Support for multiple languages and terminology

Advanced Query Options

# Query with automatic HTML output (creates word.html)
mdtt query "hello world" my_dict.mdx

# Query with custom output file
mdtt query apple my_dict.mdx -o definitions/apple.html

# Query encrypted dictionaries
mdtt query word encrypted.mdx --passcode mypassword

# Special characters in queries are handled automatically
# e.g., "can't" becomes "can_t.html"
mdtt query "can't" my_dict.mdx

Advanced Extraction Options

# Split by alphabet (with metadata)
mdtt extract large_dict.mdx --split-az

# Split into N files
mdtt extract large_dict.mdx --split-n 5

# Handle encrypted dictionaries
mdtt extract encrypted.mdx --passcode mypassword

# Extract to specific directory without metadata
mdtt extract dict.mdx -o ./output --no-meta

Command Reference

Command Purpose Key Features Options
extract Extract MDX/MDD files with metadata export Auto-metadata export, split options, database output -o (output dir), --db, --no-meta, --split-az, --split-n
pack Create MDX/MDD from sources (smart output naming) Auto-detects output filename, metadata file discovery -a (add source), -m (metadata file), multiple sources
query Search words with smart HTML file output Safe filename generation, custom output paths -o (output file), --passcode, auto HTML creation
info Display rich dictionary information Beautiful formatting, multiple export formats --format (text/json/toml), comprehensive metadata
keys List and filter dictionary keys Pattern matching, pagination, sampling --limit, --pattern, memory-efficient streaming
convert Convert between formats Text ↔ Database conversion, preservation of structure txt-to-db, db-to-txt, maintains indexes

Special Tools

  • TBX Converter: tests/script_convert_tbx_to_mdict.py - Convert TBX/TMX translation memories to MDict format

Testing

The project includes comprehensive testing:

# Run all tests
tests/run_tests.sh all

# Run specific test types
tests/run_tests.sh unit          # Fast unit tests
tests/run_tests.sh integration   # Tests with real files
tests/run_tests.sh -c            # With coverage report

# Shell integration test
tests/test_integration.sh

MDX File Format

An .mdx file consists of:

  1. Header: Dictionary metadata (Title, Description, Version, etc.) in UTF-16LE XML
  2. Keyword Section: Compressed blocks of keywords with index for quick lookup
  3. Record Section: Compressed blocks of dictionary entries (HTML content)

This structure allows efficient random access even in large dictionaries with millions of entries.

Development

Project Architecture

The project follows a modern, modular architecture:

src/mdict_utils/
├── __main__.py          # CLI entry point with subcommand routing
├── commands/            # Individual command implementations
│   ├── extract.py       # Dictionary extraction with metadata
│   ├── pack.py          # Dictionary packing with auto-detection
│   ├── query.py         # Word lookup with smart file output
│   ├── info.py          # Rich information display
│   ├── keys.py          # Key listing and filtering
│   └── convert.py       # Format conversion utilities
├── base/                # Low-level MDict format implementation
├── metadata.py          # TOML metadata management system
├── reader.py           # High-level reading interface
└── writer.py           # High-level writing interface

Core Statistics:

  • ~4,700 lines of Python code
  • 6 main commands with consistent interface
  • Comprehensive test suite (38+ tests)
  • Full type hints and documentation

Development Setup

git clone https://github.com/likai/mdtt.git
cd mdtt
uv sync  # Install dependencies and create virtual environment

Code Quality & Testing

# Code quality checks
uv run ruff check         # Linting (pycodestyle, pyflakes, security, etc.)
uv run ruff format        # Code formatting
uv run pyright          # Static type checking

# Testing options
tests/run_tests.sh all           # Complete test suite
tests/run_tests.sh unit          # Fast unit tests only  
tests/run_tests.sh integration   # Integration tests with real files
tests/run_tests.sh -c            # Run with coverage report
tests/test_integration.sh        # Shell-based integration testing

# Direct pytest usage
uv run pytest                   # Run all tests
uv run pytest -m "not slow"     # Skip performance tests

Current Status

  • Core Functionality: All major features implemented and tested
  • Modern CLI: Complete subcommand architecture with rich help
  • TOML Metadata: Full implementation with auto-detection
  • Test Coverage: Comprehensive testing including real dictionary files
  • ⚠️ Code Quality: Minor linting issues in legacy base modules (308 warnings)
  • 🔄 Active Development: Recent commits include TBX converter and enhanced query system

Acknowledgments

This project is built upon and significantly evolved from the original mdict-utils by Yugang LIU. While MDTT has been extensively rewritten with modern architecture, new features, and enhanced functionality, we acknowledge the foundational work that made this project possible.

Key differences in MDTT:

  • Complete rewrite with modern Python 3.13+ and subcommand architecture
  • TOML-based metadata management system
  • Enhanced CLI interface with comprehensive help
  • Extensive test suite with real dictionary file testing
  • New features: TBX conversion, smart query system, format conversion tools

Migration from mdict-utils v1.x

If you're upgrading from the original mdict-utils v1.x:

  1. Update command syntax to use subcommands
  2. Replace -t/-d flags with .meta.toml files
  3. Use mdtt info instead of mdtt -m
  4. Benefit from improved help, error messages, and output formatting

Reference

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mdtta-2.0.2.tar.gz (44.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mdtta-2.0.2-py3-none-any.whl (52.9 kB view details)

Uploaded Python 3

File details

Details for the file mdtta-2.0.2.tar.gz.

File metadata

  • Download URL: mdtta-2.0.2.tar.gz
  • Upload date:
  • Size: 44.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for mdtta-2.0.2.tar.gz
Algorithm Hash digest
SHA256 fe0e5c8bd3e3fe4babff85313d7e6b032d563d20ea8b4de7779944292aff8e87
MD5 c2742321f95d303ec59d6ccaa0716582
BLAKE2b-256 99fee8e96da7f41022a509991e78fd68fa067fe9987de64fb0eb9d7b3348972a

See more details on using hashes here.

File details

Details for the file mdtta-2.0.2-py3-none-any.whl.

File metadata

  • Download URL: mdtta-2.0.2-py3-none-any.whl
  • Upload date:
  • Size: 52.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for mdtta-2.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b9168c8d744c54e23b4e284efeddcc212f691a95bf8ac3d86f8da6b1f9704610
MD5 617fbe2cb2f2f4f3775b1c6a8352b2f8
BLAKE2b-256 844ed060cae8d6e51ba4fc115b7cef5f4cd8fade650104dc4ac04de149625f22

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page