Advanced text alignment and semantic containment analysis tool
Project description
Semantic Comparer
Advanced text alignment and semantic containment analysis tool using modern Python practices.
Features
- Semantic Alignment: Uses Smith-Waterman algorithm with sentence transformers for intelligent text comparison
- Modern CLI: Built with Typer and Rich for beautiful, user-friendly interface
- Async Processing: High-performance asynchronous operations
- File Support: Direct text input or file-based processing
- Rich Output: Colorized, formatted results with detailed statistics
- Type Safety: Full type annotations and modern Python practices
Usage
Install
pip install semantic-comparer
Basic Comparison
# Compare two texts directly
semantic-comparer compare "This is the first text." "This is the second text."
# Compare with custom parameters
semantic-comparer compare \
"First text content" \
"Second text content" \
--model paraphrase-multilingual-MiniLM-L12-v2 \
--gap-penalty 0.3 \
--similarity-threshold 0.5
File-based Comparison
# Compare text files (prefix with @)
semantic-comparer compare @file1.txt @file2.txt
# Mix direct text and file
semantic-comparer compare "Direct text" @file.txt
Output Options
# Quiet mode (summary only)
semantic-comparer compare text1 text2 --quiet
# Save detailed results to JSON
semantic-comparer compare text1 text2 --output results.json
Command Line Options
| Option | Short | Description | Default |
|---|---|---|---|
--model |
-m |
Sentence transformer model | paraphrase-multilingual-MiniLM-L12-v2 |
--gap-penalty |
-g |
Penalty for gaps (0.0-1.0) | 0.3 |
--similarity-threshold |
-t |
Minimum similarity for matches (0.0-1.0) | 0.5 |
--output |
-o |
Output file for JSON results | None |
--quiet |
-q |
Suppress detailed output | False |
Understanding the Results
Alignment Types
- ✓ Match: Paragraphs that are semantically similar
- ⚠ Only in A/B: Paragraphs present in one text but not the other
- ✗ Unaligned: Paragraphs that couldn't be matched
Containment Score
The semantic containment score measures how much of text A's semantic content is found in text B:
- 0.0-0.4: Low similarity (Red)
- 0.4-0.7: Moderate similarity (Yellow)
- 0.7-1.0: High similarity (Green)
Advanced Usage
Custom Models
# Use a different sentence transformer model
semantic-comparer compare text1 text2 --model all-mpnet-base-v2
Fine-tuning Parameters
# Stricter matching (higher threshold)
semantic-comparer compare text1 text2 --similarity-threshold 0.8
# More lenient gap handling (lower penalty)
semantic-comparer compare text1 text2 --gap-penalty 0.1
Technical Details
Algorithm
The tool uses the Smith-Waterman algorithm adapted for semantic similarity:
- Text Segmentation: Split texts into paragraphs
- Embedding Generation: Convert paragraphs to semantic vectors
- Similarity Calculation: Compute cosine similarity between vectors
- Dynamic Programming: Apply Smith-Waterman for optimal alignment
- Score Calculation: Weighted containment score based on matches
Performance
- Async Processing: Non-blocking I/O operations
- Memory Efficient: Streaming file processing for large texts
- Progress Tracking: Real-time progress indicators
- Error Handling: Robust error handling with user-friendly messages
Security
- Input Validation: Comprehensive parameter validation
- File Safety: Secure file operations with size limits
- Text Sanitization: Removal of problematic characters
- Error Isolation: Graceful error handling without data exposure
Contributing
- Fork the repository
- Create a feature branch
- Make your changes with proper type annotations
- Add tests for new functionality
- Ensure code passes linting and type checking
- Submit a pull request
License
MIT License - see LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file semantic_comparer-0.1.7.tar.gz.
File metadata
- Download URL: semantic_comparer-0.1.7.tar.gz
- Upload date:
- Size: 66.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1baac92ddeb5b3163ecab6ed2b712f7716fc6314394f68f045a18769ca0a7358
|
|
| MD5 |
637d67ee0e033ac9c776b445aa0cc9b8
|
|
| BLAKE2b-256 |
7230d6b6071ef797d82affe5f9cd50809974e966dbefd3e6f41309867ab38c10
|
File details
Details for the file semantic_comparer-0.1.7-py3-none-any.whl.
File metadata
- Download URL: semantic_comparer-0.1.7-py3-none-any.whl
- Upload date:
- Size: 11.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
48b830820092c1a1354dcd9d9a2a374076519ecf9616d0c18bf10c302eb232dc
|
|
| MD5 |
3f5592ac996735669367d28761fdb745
|
|
| BLAKE2b-256 |
0e02934220a844a5230884669126ca07fbc31c3363d895a47236324dd2bc5fb6
|