MDx Dict Trans ToolKit

These details have not been verified by PyPI

Project links

Project description

MDTT - MDx Dict Trans ToolKit

A modern Python 3.13+ tool for packing and unpacking MDict dictionary files (.mdx/.mdd) with advanced features and intuitive CLI interface.

English | 中文

Version 2.0 - Complete Rewrite:

🆕 Modern Subcommand Architecture - Clean CLI interface similar to git and docker

🆕 TOML Metadata Management - User-friendly .meta.toml configuration files with auto-detection

🆕 Rich Information Display - Beautiful formatted output with JSON/TOML export options

🆕 Comprehensive Testing - Full test suite including unit, integration, and real-file testing

🆕 Enhanced Query System - Smart output file naming and custom file specification

🆕 Format Conversion Tools - Built-in converters between text, database, and MDict formats

Key Features

✅ Full MDict Support: Read/Write MDict 2.0, Read MDict 3.0, supports encrypted dictionaries
✅ Multiple Output Formats: MDX/MDD files, SQLite databases, plain text, split files
✅ Intelligent CLI: Context-aware commands with comprehensive help and error handling
✅ Metadata System: Automatic .meta.toml file detection and generation
✅ Advanced Extraction: Split by alphabet, custom chunk sizes, metadata export
✅ Developer Friendly: Modern Python 3.13+, uv package manager, comprehensive type hints

Installation

From PyPI (Recommended)

pip install mdtt

Development Setup

git clone https://github.com/likai/mdtt.git
cd mdtt
uv sync  # Install dependencies with uv (recommended)
# or: pip install -e ".[dev]"  # Alternative with pip

Requirements

Python 3.13+ (required for modern typing features)
Optional: uv package manager for faster dependency resolution

Quick Start

View Available Commands

mdtt --help

Extract a Dictionary

# Basic extraction (outputs to current directory with .txt and .meta.toml)
mdtt extract my_dict.mdx

# Extract to specific directory
mdtt extract my_dict.mdx -o ./output

# Extract as database
mdtt extract my_dict.mdx --db

# Extract without metadata file
mdtt extract my_dict.mdx --no-meta

Create a Dictionary

Create your content file (my_dict.txt):

apple
A round fruit that grows on trees.
</>
banana
A long curved yellow fruit.
</>

Create metadata file (my_dict.meta.toml):

[dictionary]
title = "My Custom Dictionary"
description = "A simple English dictionary"

Pack the dictionary:

# Auto-detect output filename
mdtt pack -a my_dict.txt

# Or specify explicit output name  
mdtt pack -a my_dict.txt my_dict.mdx

Query and Information

# Query a word (displays result and saves to apple.html)
mdtt query apple my_dict.mdx

# Query with custom output filename
mdtt query apple my_dict.mdx -o definitions/apple_def.html

# Query phrases (automatically creates safe filenames)
mdtt query "can't believe" my_dict.mdx  # Creates can_t_believe.html

# Show dictionary information (rich formatted output)
mdtt info my_dict.mdx

# Export information as JSON or TOML
mdtt info my_dict.mdx --format json
mdtt info my_dict.mdx --format toml

# List dictionary keys with filtering
mdtt keys my_dict.mdx --limit 100
mdtt keys my_dict.mdx --pattern "apple*"

Advanced Usage

Working with TOML Metadata

Create .meta.toml files for automatic metadata detection:

[dictionary]
title = "Oxford Advanced Dictionary"
description = """
Comprehensive English dictionary with detailed definitions.
Perfect for students and professionals.
"""

# 其他属性（encoding, version 等）使用系统默认值
# 如需自定义，可添加 [advanced] 部分

Multiple Input Sources

# Pack multiple files (auto-detect output name)
mdtt pack -a part1.txt -a part2.txt

# Pack with explicit output name
mdtt pack -a part1.txt -a part2.txt combined.mdx

# Use custom metadata
mdtt pack -a source.txt -m custom.meta.toml

# Pack media resources (auto-detects .mdd extension)
mdtt pack -a images_folder/

Format Conversion & Import Tools

# Convert between text and database formats
mdtt convert txt-to-db dict.txt dict.db
mdtt convert db-to-txt dict.db dict.txt

# TBX (Translation Memory) to MDict conversion
# Convert TBX/TMX translation memory files to MDict format
python tests/script_convert_tbx_to_mdict.py input.tbx output.mdx
# Features:
# - Automatic metadata generation from TBX header
# - CSS styling for professional appearance  
# - Support for multiple languages and terminology

Advanced Query Options

# Query with automatic HTML output (creates word.html)
mdtt query "hello world" my_dict.mdx

# Query with custom output file
mdtt query apple my_dict.mdx -o definitions/apple.html

# Query encrypted dictionaries
mdtt query word encrypted.mdx --passcode mypassword

# Special characters in queries are handled automatically
# e.g., "can't" becomes "can_t.html"
mdtt query "can't" my_dict.mdx

Advanced Extraction Options

# Split by alphabet (with metadata)
mdtt extract large_dict.mdx --split-az

# Split into N files
mdtt extract large_dict.mdx --split-n 5

# Handle encrypted dictionaries
mdtt extract encrypted.mdx --passcode mypassword

# Extract to specific directory without metadata
mdtt extract dict.mdx -o ./output --no-meta

Command Reference

Command	Purpose	Key Features	Options
`extract`	Extract MDX/MDD files with metadata export	Auto-metadata export, split options, database output	`-o` (output dir), `--db`, `--no-meta`, `--split-az`, `--split-n`
`pack`	Create MDX/MDD from sources (smart output naming)	Auto-detects output filename, metadata file discovery	`-a` (add source), `-m` (metadata file), multiple sources
`query`	Search words with smart HTML file output	Safe filename generation, custom output paths	`-o` (output file), `--passcode`, auto HTML creation
`info`	Display rich dictionary information	Beautiful formatting, multiple export formats	`--format` (text/json/toml), comprehensive metadata
`keys`	List and filter dictionary keys	Pattern matching, pagination, sampling	`--limit`, `--pattern`, memory-efficient streaming
`convert`	Convert between formats	Text ↔ Database conversion, preservation of structure	`txt-to-db`, `db-to-txt`, maintains indexes

Special Tools

TBX Converter: tests/script_convert_tbx_to_mdict.py - Convert TBX/TMX translation memories to MDict format

Testing

The project includes comprehensive testing:

# Run all tests
tests/run_tests.sh all

# Run specific test types
tests/run_tests.sh unit          # Fast unit tests
tests/run_tests.sh integration   # Tests with real files
tests/run_tests.sh -c            # With coverage report

# Shell integration test
tests/test_integration.sh

MDX File Format

An .mdx file consists of:

Header: Dictionary metadata (Title, Description, Version, etc.) in UTF-16LE XML
Keyword Section: Compressed blocks of keywords with index for quick lookup
Record Section: Compressed blocks of dictionary entries (HTML content)

This structure allows efficient random access even in large dictionaries with millions of entries.

Development

Project Architecture

The project follows a modern, modular architecture:

src/mdict_utils/
├── __main__.py          # CLI entry point with subcommand routing
├── commands/            # Individual command implementations
│   ├── extract.py       # Dictionary extraction with metadata
│   ├── pack.py          # Dictionary packing with auto-detection
│   ├── query.py         # Word lookup with smart file output
│   ├── info.py          # Rich information display
│   ├── keys.py          # Key listing and filtering
│   └── convert.py       # Format conversion utilities
├── base/                # Low-level MDict format implementation
├── metadata.py          # TOML metadata management system
├── reader.py           # High-level reading interface
└── writer.py           # High-level writing interface

Core Statistics:

~4,700 lines of Python code
6 main commands with consistent interface
Comprehensive test suite (38+ tests)
Full type hints and documentation

Development Setup

git clone https://github.com/likai/mdtt.git
cd mdtt
uv sync  # Install dependencies and create virtual environment

Code Quality & Testing

# Code quality checks
uv run ruff check         # Linting (pycodestyle, pyflakes, security, etc.)
uv run ruff format        # Code formatting
uv run pyright          # Static type checking

# Testing options
tests/run_tests.sh all           # Complete test suite
tests/run_tests.sh unit          # Fast unit tests only  
tests/run_tests.sh integration   # Integration tests with real files
tests/run_tests.sh -c            # Run with coverage report
tests/test_integration.sh        # Shell-based integration testing

# Direct pytest usage
uv run pytest                   # Run all tests
uv run pytest -m "not slow"     # Skip performance tests

Current Status

✅ Core Functionality: All major features implemented and tested
✅ Modern CLI: Complete subcommand architecture with rich help
✅ TOML Metadata: Full implementation with auto-detection
✅ Test Coverage: Comprehensive testing including real dictionary files
⚠️ Code Quality: Minor linting issues in legacy base modules (308 warnings)
🔄 Active Development: Recent commits include TBX converter and enhanced query system

Acknowledgments

This project is built upon and significantly evolved from the original mdict-utils by Yugang LIU. While MDTT has been extensively rewritten with modern architecture, new features, and enhanced functionality, we acknowledge the foundational work that made this project possible.

Key differences in MDTT:

Complete rewrite with modern Python 3.13+ and subcommand architecture
TOML-based metadata management system
Enhanced CLI interface with comprehensive help
Extensive test suite with real dictionary file testing
New features: TBX conversion, smart query system, format conversion tools

Migration from mdict-utils v1.x

If you're upgrading from the original mdict-utils v1.x:

Update command syntax to use subcommands
Replace -t/-d flags with .meta.toml files
Use mdtt info instead of mdtt -m
Benefit from improved help, error messages, and output formatting

Reference

License

MIT License - see LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.0.2

Sep 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mdtta-2.0.2.tar.gz (44.4 kB view details)

Uploaded Sep 5, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mdtta-2.0.2-py3-none-any.whl (52.9 kB view details)

Uploaded Sep 5, 2025 Python 3

File details

Details for the file mdtta-2.0.2.tar.gz.

File metadata

Download URL: mdtta-2.0.2.tar.gz
Upload date: Sep 5, 2025
Size: 44.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for mdtta-2.0.2.tar.gz
Algorithm	Hash digest
SHA256	`fe0e5c8bd3e3fe4babff85313d7e6b032d563d20ea8b4de7779944292aff8e87`
MD5	`c2742321f95d303ec59d6ccaa0716582`
BLAKE2b-256	`99fee8e96da7f41022a509991e78fd68fa067fe9987de64fb0eb9d7b3348972a`

See more details on using hashes here.

File details

Details for the file mdtta-2.0.2-py3-none-any.whl.

File metadata

Download URL: mdtta-2.0.2-py3-none-any.whl
Upload date: Sep 5, 2025
Size: 52.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for mdtta-2.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b9168c8d744c54e23b4e284efeddcc212f691a95bf8ac3d86f8da6b1f9704610`
MD5	`617fbe2cb2f2f4f3775b1c6a8352b2f8`
BLAKE2b-256	`844ed060cae8d6e51ba4fc115b7cef5f4cd8fade650104dc4ac04de149625f22`

See more details on using hashes here.

mdtta 2.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MDTT - MDx Dict Trans ToolKit

Key Features

Installation

From PyPI (Recommended)

Development Setup

Requirements

Quick Start

View Available Commands

Extract a Dictionary

Create a Dictionary

Query and Information

Advanced Usage

Working with TOML Metadata

Multiple Input Sources

Format Conversion & Import Tools

Advanced Query Options

Advanced Extraction Options

Command Reference

Special Tools

Testing

MDX File Format

Development

Project Architecture

Development Setup

Code Quality & Testing

Current Status

Acknowledgments

Migration from mdict-utils v1.x

Reference

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes