MDx Dict Trans ToolKit
Project description
MDTT - MDx Dict Trans ToolKit
A modern Python 3.13+ tool for packing and unpacking MDict dictionary files (.mdx/.mdd) with advanced features and intuitive CLI interface.
English | 中文
Version 2.0 - Complete Rewrite:
- 🆕 Modern Subcommand Architecture - Clean CLI interface similar to
gitanddocker- 🆕 TOML Metadata Management - User-friendly
.meta.tomlconfiguration files with auto-detection- 🆕 Rich Information Display - Beautiful formatted output with JSON/TOML export options
- 🆕 Comprehensive Testing - Full test suite including unit, integration, and real-file testing
- 🆕 Enhanced Query System - Smart output file naming and custom file specification
- 🆕 Format Conversion Tools - Built-in converters between text, database, and MDict formats
Key Features
- ✅ Full MDict Support: Read/Write MDict 2.0, Read MDict 3.0, supports encrypted dictionaries
- ✅ Multiple Output Formats: MDX/MDD files, SQLite databases, plain text, split files
- ✅ Intelligent CLI: Context-aware commands with comprehensive help and error handling
- ✅ Metadata System: Automatic
.meta.tomlfile detection and generation - ✅ Advanced Extraction: Split by alphabet, custom chunk sizes, metadata export
- ✅ Developer Friendly: Modern Python 3.13+, uv package manager, comprehensive type hints
Installation
From PyPI (Recommended)
pip install mdtt
Development Setup
git clone https://github.com/likai/mdtt.git
cd mdtt
uv sync # Install dependencies with uv (recommended)
# or: pip install -e ".[dev]" # Alternative with pip
Requirements
- Python 3.13+ (required for modern typing features)
- Optional:
uvpackage manager for faster dependency resolution
Quick Start
View Available Commands
mdtt --help
Extract a Dictionary
# Basic extraction (outputs to current directory with .txt and .meta.toml)
mdtt extract my_dict.mdx
# Extract to specific directory
mdtt extract my_dict.mdx -o ./output
# Extract as database
mdtt extract my_dict.mdx --db
# Extract without metadata file
mdtt extract my_dict.mdx --no-meta
Create a Dictionary
- Create your content file (
my_dict.txt):
apple
A round fruit that grows on trees.
</>
banana
A long curved yellow fruit.
</>
- Create metadata file (
my_dict.meta.toml):
[dictionary]
title = "My Custom Dictionary"
description = "A simple English dictionary"
- Pack the dictionary:
# Auto-detect output filename
mdtt pack -a my_dict.txt
# Or specify explicit output name
mdtt pack -a my_dict.txt my_dict.mdx
Query and Information
# Query a word (displays result and saves to apple.html)
mdtt query apple my_dict.mdx
# Query with custom output filename
mdtt query apple my_dict.mdx -o definitions/apple_def.html
# Query phrases (automatically creates safe filenames)
mdtt query "can't believe" my_dict.mdx # Creates can_t_believe.html
# Show dictionary information (rich formatted output)
mdtt info my_dict.mdx
# Export information as JSON or TOML
mdtt info my_dict.mdx --format json
mdtt info my_dict.mdx --format toml
# List dictionary keys with filtering
mdtt keys my_dict.mdx --limit 100
mdtt keys my_dict.mdx --pattern "apple*"
Advanced Usage
Working with TOML Metadata
Create .meta.toml files for automatic metadata detection:
[dictionary]
title = "Oxford Advanced Dictionary"
description = """
Comprehensive English dictionary with detailed definitions.
Perfect for students and professionals.
"""
# 其他属性(encoding, version 等)使用系统默认值
# 如需自定义,可添加 [advanced] 部分
Multiple Input Sources
# Pack multiple files (auto-detect output name)
mdtt pack -a part1.txt -a part2.txt
# Pack with explicit output name
mdtt pack -a part1.txt -a part2.txt combined.mdx
# Use custom metadata
mdtt pack -a source.txt -m custom.meta.toml
# Pack media resources (auto-detects .mdd extension)
mdtt pack -a images_folder/
Format Conversion & Import Tools
# Convert between text and database formats
mdtt convert txt-to-db dict.txt dict.db
mdtt convert db-to-txt dict.db dict.txt
# TBX (Translation Memory) to MDict conversion
# Convert TBX/TMX translation memory files to MDict format
python tests/script_convert_tbx_to_mdict.py input.tbx output.mdx
# Features:
# - Automatic metadata generation from TBX header
# - CSS styling for professional appearance
# - Support for multiple languages and terminology
Advanced Query Options
# Query with automatic HTML output (creates word.html)
mdtt query "hello world" my_dict.mdx
# Query with custom output file
mdtt query apple my_dict.mdx -o definitions/apple.html
# Query encrypted dictionaries
mdtt query word encrypted.mdx --passcode mypassword
# Special characters in queries are handled automatically
# e.g., "can't" becomes "can_t.html"
mdtt query "can't" my_dict.mdx
Advanced Extraction Options
# Split by alphabet (with metadata)
mdtt extract large_dict.mdx --split-az
# Split into N files
mdtt extract large_dict.mdx --split-n 5
# Handle encrypted dictionaries
mdtt extract encrypted.mdx --passcode mypassword
# Extract to specific directory without metadata
mdtt extract dict.mdx -o ./output --no-meta
Command Reference
| Command | Purpose | Key Features | Options |
|---|---|---|---|
extract |
Extract MDX/MDD files with metadata export | Auto-metadata export, split options, database output | -o (output dir), --db, --no-meta, --split-az, --split-n |
pack |
Create MDX/MDD from sources (smart output naming) | Auto-detects output filename, metadata file discovery | -a (add source), -m (metadata file), multiple sources |
query |
Search words with smart HTML file output | Safe filename generation, custom output paths | -o (output file), --passcode, auto HTML creation |
info |
Display rich dictionary information | Beautiful formatting, multiple export formats | --format (text/json/toml), comprehensive metadata |
keys |
List and filter dictionary keys | Pattern matching, pagination, sampling | --limit, --pattern, memory-efficient streaming |
convert |
Convert between formats | Text ↔ Database conversion, preservation of structure | txt-to-db, db-to-txt, maintains indexes |
Special Tools
- TBX Converter:
tests/script_convert_tbx_to_mdict.py- Convert TBX/TMX translation memories to MDict format
Testing
The project includes comprehensive testing:
# Run all tests
tests/run_tests.sh all
# Run specific test types
tests/run_tests.sh unit # Fast unit tests
tests/run_tests.sh integration # Tests with real files
tests/run_tests.sh -c # With coverage report
# Shell integration test
tests/test_integration.sh
MDX File Format
An .mdx file consists of:
- Header: Dictionary metadata (Title, Description, Version, etc.) in UTF-16LE XML
- Keyword Section: Compressed blocks of keywords with index for quick lookup
- Record Section: Compressed blocks of dictionary entries (HTML content)
This structure allows efficient random access even in large dictionaries with millions of entries.
Development
Project Architecture
The project follows a modern, modular architecture:
src/mdict_utils/
├── __main__.py # CLI entry point with subcommand routing
├── commands/ # Individual command implementations
│ ├── extract.py # Dictionary extraction with metadata
│ ├── pack.py # Dictionary packing with auto-detection
│ ├── query.py # Word lookup with smart file output
│ ├── info.py # Rich information display
│ ├── keys.py # Key listing and filtering
│ └── convert.py # Format conversion utilities
├── base/ # Low-level MDict format implementation
├── metadata.py # TOML metadata management system
├── reader.py # High-level reading interface
└── writer.py # High-level writing interface
Core Statistics:
- ~4,700 lines of Python code
- 6 main commands with consistent interface
- Comprehensive test suite (38+ tests)
- Full type hints and documentation
Development Setup
git clone https://github.com/likai/mdtt.git
cd mdtt
uv sync # Install dependencies and create virtual environment
Code Quality & Testing
# Code quality checks
uv run ruff check # Linting (pycodestyle, pyflakes, security, etc.)
uv run ruff format # Code formatting
uv run pyright # Static type checking
# Testing options
tests/run_tests.sh all # Complete test suite
tests/run_tests.sh unit # Fast unit tests only
tests/run_tests.sh integration # Integration tests with real files
tests/run_tests.sh -c # Run with coverage report
tests/test_integration.sh # Shell-based integration testing
# Direct pytest usage
uv run pytest # Run all tests
uv run pytest -m "not slow" # Skip performance tests
Current Status
- ✅ Core Functionality: All major features implemented and tested
- ✅ Modern CLI: Complete subcommand architecture with rich help
- ✅ TOML Metadata: Full implementation with auto-detection
- ✅ Test Coverage: Comprehensive testing including real dictionary files
- ⚠️ Code Quality: Minor linting issues in legacy base modules (308 warnings)
- 🔄 Active Development: Recent commits include TBX converter and enhanced query system
Acknowledgments
This project is built upon and significantly evolved from the original mdict-utils by Yugang LIU. While MDTT has been extensively rewritten with modern architecture, new features, and enhanced functionality, we acknowledge the foundational work that made this project possible.
Key differences in MDTT:
- Complete rewrite with modern Python 3.13+ and subcommand architecture
- TOML-based metadata management system
- Enhanced CLI interface with comprehensive help
- Extensive test suite with real dictionary file testing
- New features: TBX conversion, smart query system, format conversion tools
Migration from mdict-utils v1.x
If you're upgrading from the original mdict-utils v1.x:
- Update command syntax to use subcommands
- Replace
-t/-dflags with.meta.tomlfiles - Use
mdtt infoinstead ofmdtt -m - Benefit from improved help, error messages, and output formatting
Reference
License
MIT License - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mdtta-2.0.2.tar.gz.
File metadata
- Download URL: mdtta-2.0.2.tar.gz
- Upload date:
- Size: 44.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe0e5c8bd3e3fe4babff85313d7e6b032d563d20ea8b4de7779944292aff8e87
|
|
| MD5 |
c2742321f95d303ec59d6ccaa0716582
|
|
| BLAKE2b-256 |
99fee8e96da7f41022a509991e78fd68fa067fe9987de64fb0eb9d7b3348972a
|
File details
Details for the file mdtta-2.0.2-py3-none-any.whl.
File metadata
- Download URL: mdtta-2.0.2-py3-none-any.whl
- Upload date:
- Size: 52.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9168c8d744c54e23b4e284efeddcc212f691a95bf8ac3d86f8da6b1f9704610
|
|
| MD5 |
617fbe2cb2f2f4f3775b1c6a8352b2f8
|
|
| BLAKE2b-256 |
844ed060cae8d6e51ba4fc115b7cef5f4cd8fade650104dc4ac04de149625f22
|