Skip to main content

STRNaming STR Sequence Nomenclature

Project description

STRNaming

STRNaming is an algorithm for generating simple, informative names for Short Tandem Repeat (STR) sequences, such as those used in the field of forensic genetics, in a standardised and automated manner.

Requirements

STRNaming requires Python version 3.5 or later.

Installation

The recommended way to install STRNaming is by using the pip package installer. If you have pip installed, you can easily install STRNaming by running the following command:

pip install strnaming

Alternatively, STRNaming can be installed by running:

python setup.py install

Usage

This initial version of STRNaming allows generating allele names for sequence data using the ranges and sequence orientation of the "Flanking Region Report" of the Universal Analysis Software for the ForenSeq DNA Signature Prep Kit (Verogen).

Command-line interface

The command-line help can be accessed by running strnaming --help. In short, an STRNaming command looks like this:

strnaming name-sequences --ranges uas-frr inputfile.txt outputfile.txt

The input file should have a marker name and a sequence on each line, separated by whitespace (i.e., tabs or spaces).

If no output file is given, the output is written to stdout, which normally shows up in your command line window. If no input file is given either, STRNaming will read input from stdin, allowing you to type the input one line at a time.

Programming interface

It is not recommended to import and use parts of this version of STRNaming directly from other Python code, because the internal API is not stable yet. Instead, use the subprocess module if you want to use STRNaming in your Python project at this time. As an added benefit, it will run in a concurrent process, meaning your code does not (necessarily) have to wait for STRNaming to finish.

To use STRNaming in other software projects, regardless of the programming language, it can be run as a separate subprocess. Write a marker name, a whitespace character, the DNA sequence, and a newline character (\n) to its standard input stream (stdin), and STRNaming will write the same marker name, a tab character, the allele name and a newline character to its standard output stream (stdout). Any errors are reported on the standard error stream (stderr) and will cause the STRNaming process to terminate. By specifying the --unbuffered command-line switch, STRNaming will immediately flush its output stream after every line of output.

Offline use

STRNaming will automatically download and cache portions of reference sequence from the Ensembl REST API (http://rest.ensembl.org). If you are running STRNaming on a system without internet access, and you need a piece of reference sequence that was not bundled with the STRNaming package, a message will be displayed to manually store the reference sequence in a specific location. To this end, run the following command (on a system with internet access) to download the sequence:

strnaming refseq-cache chr2:1489653..1489689

Upon success, the location of the downloaded cache files will be displayed. These are the files to be copied to the offline system for STRNaming to work.

Release Notes

Version 1.1.0 (15 July 2021)

Naming of some loci has been updated as a result of bug fixes and improvements to the algorithm. Scoring criteria have been updated to minimize unintended side-effects of these changes.

  • Fixed a major issue with HPRTB allele numbering: previously, the CE allele number calculated for a given sequence was one higher than it should be.
  • Allele names are now permitted to contain repeats of a unit that exceeds the dominant unit length of a locus. This change greatly improves naming of some complex Y-STRs.
  • Short repeat stretches that only partially overlap with a significant repeat of a longer unit are no longer discarded. This change may introduce short repeats adjacent to longer repeats of a longer unit, which were previously 'missed' by STRNaming.
  • Fixed bug that disallowed making interruptions which could be filled exactly with an 'orphan' repeat, thereby forcing the use of a compatible 'anchor'.
  • Reference sequence analysis now guarantees that all repeat units in the final result are actually repeated.
  • Reference repeat units only found outside the reported range are now included in the list of preferred units when generating allele names. This change improves naming stability when a significant part of the reference STR structure lies outside the reported range.
  • STRNaming will no longer consider names that include an interruption of which the sequence is equal to an adjacent repeat unit (e.g., CCTA[2]CCTA[1]TCTA[2]).

New features:

  • The built-in reference sequence cache was introduced, along with the new mandatory ACTION command-line argument.
  • Colored output in HTML format is now available by using the --html command-line argument.
  • Reference sequence analysis results of almost the entire human genome have been embedded into the package.

Version 1.0.0 (21 December 2020)

Initial release of STRNaming.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strnaming-1.1.0.tar.gz (9.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

strnaming-1.1.0-py3-none-any.whl (9.7 MB view details)

Uploaded Python 3

File details

Details for the file strnaming-1.1.0.tar.gz.

File metadata

  • Download URL: strnaming-1.1.0.tar.gz
  • Upload date:
  • Size: 9.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for strnaming-1.1.0.tar.gz
Algorithm Hash digest
SHA256 d5a267d4c29b31a6983591710ea34603eaf020ce448a12003748ef86c8a14db2
MD5 f679fd8184c1f239f8d8709b888dd347
BLAKE2b-256 0b41cf1fcf09c7fe030d73ba9f1e1794f37d749fc0a5147e65660c6c2d555391

See more details on using hashes here.

File details

Details for the file strnaming-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: strnaming-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for strnaming-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 df2fabd77b2178f496cbd523e266aa11a589b80b2efd11a9fd2d48e774800823
MD5 71055bb2d8683868addd38700125b7f0
BLAKE2b-256 dd79307db90577e10ddd875472375252827f730bb6718c5b33a31894cac33f7f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page