ScienceBeam Alignment
Project description
ScienceBeam Utils
Provides sequence alignment utility functions to ScienceBeam projects.
Pre-requisites
- Python 2 or 3
API
SequenceMatcher
The mostly drop-in replacement of Python's SequenceMatcher is provided by fuzzywuzzy's StringMatcher.
In that respect, sciencebeam-alignment
merely provides a wrapper with fallback.
WordSequenceMatcher
A wrapper around the aforementioned SequenceMatcher
, but matching on word level tokens only.
It currently only implements get_matching_blocks
.
The main advantage is that it is much faster for long texts, because it won't have to match individual characters. It isn't recommended for short texts, where character level alignment is probably more desirable.
example match results:
>>> from sciencebeam_alignment.word_sequence_matcher import (
... WordSequenceMatcher
... )
>>> WordSequenceMatcher(a='word1', b='word2').get_matching_blocks()
[]
>>> WordSequenceMatcher(a='a word1 b', b='x word1 y').get_matching_blocks()
[(2, 2, 5)]
GlobalSequenceMatcher and LocalSequenceMatcher
The GlobalSequenceMatcher and LocalSequenceMatcher implements the Needleman-Wunsch global alignment as well as the Smith-Waterman local alignment algorithms. The implementation is somewhat inspired by python-alignment.
It does implement get_matching_blocks
to match Python's SequenceMatcher.
By passing in a scoring object, the results can be influenced (e.g. gaps can be peanilized more).
It does also provide an optimized implementation using Cython. The level of optimization depends on the type of passed in sequences and scoring. The fastest being with integer sequences and simple scoring.
>>> from sciencebeam_alignment.align import LocalSequenceMatcher, SimpleScoring
>>> DEFAULT_SCORING = SimpleScoring(match_score=3, mismatch_score=-1, gap_score=-2)
>>> LocalSequenceMatcher(a='a word1 b', b='x word2 y', scoring=DEFAULT_SCORING).get_matching_blocks()
[(1, 1, 5), (7, 7, 1), (9, 9, 0)]
To check whether the fast implementation is enabled:
>>> from sciencebeam_alignment.align import native_enabled
>>> print(native_enabled)
True
Development
Development can be done either using Docker (default) or a virtual environment.
All commands are available via make
.
Development using Docker
Build and run tests:
make build test
Or intended for CI:
make ci-build-and-test
Development using a virtual environment
make
targets with the dev-
prefix are intended for the use with the virtual environment.
This requires that you already have Python installed.
Setup (virtual environment)
make dev-venv
To update the dependencies:
make dev-install
Cython (virtual environment)
Compile code using Cython:
make dev-cython-clean dev-cython-compile
Tests (virtual environment)
make dev-test
Or:
make dev-watch
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for sciencebeam_alignment-0.0.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | f0fdffb40f6836708f9ef526ef71b465a7f4ff29f5d47fb7129e8ec7433b4f4b |
|
MD5 | a974a9d9b3b91feaa6abf881ca77a312 |
|
BLAKE2b-256 | 3e3eb8d8496426dc0dad1e9221b117fd9154ae68301bf8622d21f6808354f9eb |