Skip to main content

A package with a simple 1D-DTW implementation for sentence alignment.

Project description

DTW-Sentence-Alignment

A simple, low-dependency package for aligning sentences by minimizing a chosen metric.

Motivation

I needed to match sentences between two lists and was disappointed to find that there wasn't a simple package available for this task without excessive dependencies or unintuitive interfaces.

Overview

DTW-Sentence-Alignment is a Python package for aligning sentences using Dynamic Time Warping (DTW). It supports custom similarity functions and predefined metrics, maximizing alignment scores. Unlike traditional implementations, it allows flexible starting and ending points for alignment.

Installation

To install the package, you can use pip:

pip install dtwsa

Usage

Here's a basic example of how to use the package:

from dtwsa import SentenceAligner
from dtwsa.metrics import WER_similarity

# Align sentences
list_1 = [
    "Something which does not match",
    "Matching sentence number one",
    "Something which does not match",
    "Another matching sentence",
    "Something which does not match",
    "Random Sentence which should match",
    "This should be matched with something",
    "Yet another matching sentence",
    "Random Sentence which should match",
    "This should be matched with something",
    "Yet another matching sentence",
    "Something which does not match",
    "Something which does not match",
    "Something that matches again",
    "Something which does not match",
]

list_2 = [
    "Matching sentence number one",
    "Another matching sentence",
    "Random Sentence which should match",
    "This should be matched with something",
    "Yet another matching sentence",
    "Random Sentence which should match",
    "This should be matched with something",
    "Yet another matching sentence",
    "Something that matches again",
    "Something leftover",
]

# Create a SentenceAligner object with the WER_similarity metric and 0.7 as the minimum matching value
# The minimum matching value is useful to avoid matching sentences that are not similar enough.
# Better to not match anything than to match something that is not similar enough.

aligner = SentenceAligner(WER_similarity, min_matching_value=0.7)

# Align the sentences and get the alignment and score
alignment, score = aligner.align_sentences(list_1, list_2)

print(f"Alignment: {alignment}") # [(0, 0), (1, 1), (3, 2), (5, 3), (6, 4), (7, 5), (8, 6), (9, 7), (10, 8), (13, 9)]
print(f"Score: {score}") # 10.0

# Plot the alignment
aligner.visualize_alignment(list_1, list_2)

Visuaization of the alignment

Features

  • Flexible sentence alignment using custom similarity functions
  • Predefined metrics like Word Error Rate (WER) similarity
  • Simple API for easy integration

TODO

  1. Improve efficiency of the alignment algorithm by limiting the choices of the alignment by limiting the maximum distance of indexes between matches.
  2. Improve efficiency of the alignment algorithm by implementing a version of PrunedDTW.
  3. Improve efficiency of the alignment algorithm by parallelization.
  4. Add new metrics for sentence comparison (e.g., BLEU score, cosine similarity)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dtwsa-0.0.9.tar.gz (158.6 kB view details)

Uploaded Source

Built Distribution

dtwsa-0.0.9-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file dtwsa-0.0.9.tar.gz.

File metadata

  • Download URL: dtwsa-0.0.9.tar.gz
  • Upload date:
  • Size: 158.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for dtwsa-0.0.9.tar.gz
Algorithm Hash digest
SHA256 e9728e825ff5ed7a654655b7a398f4e06a970319eb880f39caf05cc7f1c5fa8b
MD5 69f3d523166e2898124c7b7dd9a561a7
BLAKE2b-256 e849dde730e2de231d6c974d5a794ae814fb7f69c163d35ad7d5a1e6195794b8

See more details on using hashes here.

File details

Details for the file dtwsa-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: dtwsa-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for dtwsa-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 eb5b4e5be978c0757fefdc39e8d1c39c56bf6c345929c0fce1db5c0ec8ff4ce7
MD5 c71fda5651d7a7a1c05a85fc25257f1d
BLAKE2b-256 1de6584879bdaff70554a7f23fc9619da383ae679c6c1d173c920771b16bfb99

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page