Skip to main content

Reference-guided CLI tool for finding and annotating human rDNA units in FASTA sequences

Project description

andro

andro is a reference-guided command line tool for finding and annotating human ribosomal DNA (rDNA) units in FASTA sequences. It is built around the KY962518-ROT reference, reports annotations in BED format, and can optionally generate dotplots for detected units.

The tool was developed as part of a bachelor's thesis project and is intended for exploratory analysis of human rDNA-containing assemblies.

Installation

Install the latest release from PyPI:

pip install andro

Or install from a local clone:

git clone https://github.com/dmelkovic/andro.git
cd andro
pip install .

Basic usage

Run andro with a FASTA file:

andro example.fa

By default, results are written to standard output in BED format. To write the annotations to a file:

andro example.fa -o annotations.bed

To generate a dotplot for each reported unit:

andro example.fa --plot ref --dir plots

Display all available options with:

andro --help

What andro reports

Given a FASTA file with one or more records, andro will:

  • find 5.8S rDNA candidates and extend them to full 45S regions when possible
  • find rDNA units anchored by detected 45S regions
  • extend detected 45S regions to full rDNA units when the surrounding sequence supports it
  • annotate major rDNA features
  • write annotations in BED format
  • optionally generate dotplots for each reported unit

Design choices and limitations

Forward orientation

andro searches for rDNA units in the forward orientation relative to the KY962518-ROT reference. This keeps annotation coordinates consistent with the ordered 45S and IGS model used internally.

If an assembly contains rDNA arrays in the reverse-complemented orientation, run andro on a reverse-complemented copy of that FASTA record as a separate input.

Complete 45S regions by default

By default, andro reports only units where a complete 45S region is found. If a substantial part of the 45S region is missing, the sequence is not reported as an rDNA unit in the default mode.

The --partial option enables reporting of incomplete units. Partial-unit annotation is experimental: regions shorter than approximately 2500 bp are not reported, and incomplete annotations should be reviewed manually before being used downstream.

Reference

andro includes the KY962518-ROT reference sequence used for detection and annotation. Results should be interpreted relative to that reference and the feature model encoded in the package.

License

andro is distributed under the MIT License. See LICENSE for details.

Issues

Please report bugs and unexpected results at:

https://github.com/dmelkovic/andro/issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

andro-0.3.tar.gz (29.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

andro-0.3-py3-none-any.whl (32.4 kB view details)

Uploaded Python 3

File details

Details for the file andro-0.3.tar.gz.

File metadata

  • Download URL: andro-0.3.tar.gz
  • Upload date:
  • Size: 29.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for andro-0.3.tar.gz
Algorithm Hash digest
SHA256 077a16dc8d6a403b2caf2ebe71cfe6e022ed9aa62420c4b0c18ede12f3059c54
MD5 2d48f08d40b0286ca476d34d0f3ee812
BLAKE2b-256 1288f001bc39b25179a2a0f45d2d07922833e18ada4ce555202873b63a5f2e95

See more details on using hashes here.

File details

Details for the file andro-0.3-py3-none-any.whl.

File metadata

  • Download URL: andro-0.3-py3-none-any.whl
  • Upload date:
  • Size: 32.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for andro-0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a59eddcbc29b4668ff05c41e17f05696d929c61ecb7e76b870af235a99f7a5a8
MD5 383965faf347feeed43b3868cb8ef345
BLAKE2b-256 ae07f456337977b7a533c0ab0ba6593985b9f7edcaf519528d125238c40a8f5a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page