Reference-guided CLI tool for finding and annotating human rDNA units in FASTA sequences
Project description
andro
andro is a reference-guided command line tool for finding and annotating
human ribosomal DNA (rDNA) units in FASTA sequences. It is built around the
KY962518-ROT reference, reports annotations in BED format, and can optionally
generate dotplots for detected units.
The tool was developed as part of a bachelor's thesis project and is intended for exploratory analysis of human rDNA-containing assemblies.
Installation
Install the latest release from PyPI:
pip install andro
Or install from a local clone:
git clone https://github.com/dmelkovic/andro.git
cd andro
pip install .
Basic usage
Run andro with a FASTA file:
andro example.fa
By default, results are written to standard output in BED format. To write the annotations to a file:
andro example.fa -o annotations.bed
To generate a dotplot for each reported unit:
andro example.fa --plot ref --dir plots
Display all available options with:
andro --help
What andro reports
Given a FASTA file with one or more records, andro will:
- find 5.8S rDNA candidates and extend them to full 45S regions when possible
- find rDNA units anchored by detected 45S regions
- extend detected 45S regions to full rDNA units when the surrounding sequence supports it
- annotate major rDNA features
- write annotations in BED format
- optionally generate dotplots for each reported unit
Design choices and limitations
Forward orientation
andro searches for rDNA units in the forward orientation relative to the
KY962518-ROT reference. This keeps annotation coordinates consistent with the
ordered 45S and IGS model used internally.
If an assembly contains rDNA arrays in the reverse-complemented orientation,
run andro on a reverse-complemented copy of that FASTA record as a separate
input.
Complete 45S regions by default
By default, andro reports only units where a complete 45S region is found. If
a substantial part of the 45S region is missing, the sequence is not reported
as an rDNA unit in the default mode.
The --partial option enables reporting of incomplete units. Partial-unit
annotation is experimental: regions shorter than approximately 2500 bp are not
reported, and incomplete annotations should be reviewed manually before being
used downstream.
Reference
andro includes the KY962518-ROT reference sequence used for detection and
annotation. Results should be interpreted relative to that reference and the
feature model encoded in the package.
License
andro is distributed under the MIT License. See LICENSE for details.
Issues
Please report bugs and unexpected results at:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file andro-0.3.tar.gz.
File metadata
- Download URL: andro-0.3.tar.gz
- Upload date:
- Size: 29.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
077a16dc8d6a403b2caf2ebe71cfe6e022ed9aa62420c4b0c18ede12f3059c54
|
|
| MD5 |
2d48f08d40b0286ca476d34d0f3ee812
|
|
| BLAKE2b-256 |
1288f001bc39b25179a2a0f45d2d07922833e18ada4ce555202873b63a5f2e95
|
File details
Details for the file andro-0.3-py3-none-any.whl.
File metadata
- Download URL: andro-0.3-py3-none-any.whl
- Upload date:
- Size: 32.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a59eddcbc29b4668ff05c41e17f05696d929c61ecb7e76b870af235a99f7a5a8
|
|
| MD5 |
383965faf347feeed43b3868cb8ef345
|
|
| BLAKE2b-256 |
ae07f456337977b7a533c0ab0ba6593985b9f7edcaf519528d125238c40a8f5a
|