Skip to main content

A package with a simple 1D-DTW implementation for sentence alignment.

Project description

DTW-Sentence-Alignment

A simple, low-dependency package for aligning sentences by minimizing a chosen metric.

Overview

DTW-Sentence-Alignment is a Python package that provides functionality for aligning sentences using Dynamic Time Warping (DTW) algorithm. It allows users to align sentences based on custom similarity functions or predefined metrics. The alignment works by maximizing a score. Additionally, compared to other implementation, the first starting point does not have to be (0,0) and the last ending point does not have to be (n,m).

Installation

To install the package, you can use pip: pip install dtwsa

Usage

Here's a basic example of how to use the package:

from dtwsa import SentenceAligner
from dtwsa.metrics import WER_similarity

# Align sentences
list_1 = [
    "Something which does not match",
    "Matching sentence number one",
    "Something which does not match",
    "Another matching sentence",
    "Something which does not match",
    "Random Sentence which should match",
    "This should be matched with something",
    "Yet another matching sentence",
    "Random Sentence which should match",
    "This should be matched with something",
    "Yet another matching sentence",
    "Something which does not match",
    "Something which does not match",
    "Something that matches again",
    "Something which does not match",
]

list_2 = [
    "Something which does not match",
    "Matching sentence number one",
    "Another matching sentence",
    "Random Sentence which should match",
    "This should be matched with something",
    "Yet another matching sentence",
    "Random Sentence which should match",
    "This should be matched with something",
    "Yet another matching sentence",
    "Something that matches again",
    "Something leftover",
]

# Create a SentenceAligner object with the WER_similarity metric and 0.7 as the minimum matching value

alinger = SentenceAligner(WER_similarity, min_matching_value=0.7)

# Align the sentences and get the alignment and score
alignment, score = aligner.align_sentences(list_1, list_2)

print(f"Alignment: {alignment}") # [(0, 0), (1, 1), (3, 2), (5, 3), (6, 4), (7, 5), (8, 6), (9, 7), (10, 8), (13, 9)]
print(f"Score: {score}") # 10.0

# Plot the alignment
aligner.visualize_alignment(list_1, list_2)

Visuaization of the alignment

Features

  • Flexible sentence alignment using custom similarity functions
  • Predefined metrics like Word Error Rate (WER) similarity
  • Simple API for easy integration

TODO

  1. Improve efficiency of the alignment algorithm
  2. Improve efficiency of the alignment algorithm
  3. Improve efficiency of the alignment algorithm
  4. Add new metrics for sentence comparison (e.g., BLEU score, cosine similarity)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dtwsa-0.0.2.tar.gz (168.9 kB view details)

Uploaded Source

Built Distribution

dtwsa-0.0.2-py3-none-any.whl (5.1 kB view details)

Uploaded Python 3

File details

Details for the file dtwsa-0.0.2.tar.gz.

File metadata

  • Download URL: dtwsa-0.0.2.tar.gz
  • Upload date:
  • Size: 168.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for dtwsa-0.0.2.tar.gz
Algorithm Hash digest
SHA256 99a9234b15f952526a19fe701bd9201351b57d01ffc031d4c33aadb3b93a5ff7
MD5 c2cecb41d49aafc737eb7e07c54f4ff2
BLAKE2b-256 45291ce06227a70ac4fdee9442cb1279d6adce0787ba4f33e1a048292ae1b3ad

See more details on using hashes here.

File details

Details for the file dtwsa-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: dtwsa-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 5.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for dtwsa-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d7f6477da5fa6cc4947853a9700d25985712804aaf2d0f1bb8d0907e98780995
MD5 f4f4420d5b9535634be9f564f0d1313b
BLAKE2b-256 30d9bfe3349ccead5788715ba4a5a4e2aee5a9aac12a1a56169d20cd61f11c46

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page