This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description
# **Aligner**

### What
**Aligner** is a tool for [FASTA] (https://en.wikipedia
.org/wiki/FASTA_format) DNA sequence alignment.

### How to use
**Aligner** is fairly simple to use. All you need is a standard FASTA text
file (*.txt).

By the way, included in this package are two example files (fasta_example_1

In order to install,

pip install sequence_aligner

To use aligner from the command line, you run the following commands:

aligner [-h] --file_path FILE_PATH [--storage_path STORAGE_PATH] [--results_name

FILE PATH is the path to the fasta sequence text file.

While FILE PATH is required, STORAGE_PATH and FILE_NAME are optional.

By default, the result file will be stored in the following generated directory:

And the file name will be generated with the following pattern:
`sequence_read-<datetime here="">.txt` for example: `sequence_read-2016-07-01T16.42.21.246183.txt`

Once you run the command line, within seconds, your results will be
generated, and you'll receive a print out of the location and name of your file.

For example:
```Sequence Alignment results stored --> /tmp/sequence_results/sequence_read-2016-07-01T17.59.48.458859.txt```

### How it works
In order to align sequences, first, I created a dictionary of all
subsequences from the size 'greater than half' to full length.

One sequence in the list of sequences was designated the "anchor

I gathered that at any given state of the anchor sequence, there
would exist a sequence with the the greatest overlap. Iterating
through the sequence list, the sequence with the
sub-sequence (from the dictionary mentioned above) having the
greatest 'score' is identified. The score is based on the amount of
overlap with the anchor sequence. This subsequence is then merged
into the anchor sequence. This iteration continues until all
sequences in the sequence list have been merged into the anchor

## Caveats/Issues
* One issue I dealt with was speed. After many iterations of
refactoring, I was able to bring runtime down from ~45-50 min to ~10-11
* This program assumes that sequences overlap and that there are no

## Next Steps
* Improve efficiency
* Improve 'match-iness' algorithm -- right now, my 'score' is merely
based on length of overlap at a given state of the anchor sequence.
More preprocessing could be done before commencing alignment in order
to score the level of 'matchiness' between all pairs of sequences.
Release History

Release History


This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
SequenceAligner-0.0.2.tar.gz (5.8 kB) Copy SHA256 Checksum SHA256 Source Jul 6, 2016

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting