De Novo Decomposition of Satellite DNA Arrays into Monomers within Telomere-to-Telomere Assemblies
Project description
ArraySplitter: De Novo Decomposition of Satellite DNA Arrays
Decomposes satellite DNA arrays into monomers within telomere-to-telomere (T2T) assemblies. Ideal for analyzing centromeric and pericentromeric regions on monomeric level.
Status: In development. Optimized for 100Kb scale arrays; longer arrays will work but may take longer to process. Signigicanlty longer time.
Update: From 1.1.6, ArraySplitter now successfully decomposes arrays on megabase scale. Largest arrays takes around 5 minutes to process. Fortunatelly, there are only 41 arrays large 1 Mb in CHM13v20 assembly. And I'm going to add parallel processing to speed up singificantly the process. Currently, it is single-threaded.
Update: Monomers are required some polising of borders, I am working on it.
Update: To test ArraySplitter, I used CHM13v20 assembly, it takes around 3 hours, to decompose all arrays longer than 1 Kb (13K arrays).
Installation
Prerequisites
- Python 3.6 or later
Installation with pip:
pip install arraysplitter
Usage
Basic Example
time arraysplitter -i chr1.arrays.fa -o chr1.arrays
Explanation
-i chr1.arrays.fa
: FASTA file of satDNA arrays.-o chr1.arrays
: Prefix for the output FASTA containing decomposed monomers (separated by spaces).
All Options
arraysplitter --help
Rotating monomers to start with the same sequence
We found that different arrays of the same repeat family can be decomposed sligtly differently. To make them comparable, ArraySplitter can rotate monomers to start with the same sequence.
arraysplitter_rotate -i arrays.fa -o arrays.norm.fa
Explanation
-i arrays.fa
: FASTA file of monomers.-o arrays.norm.fa
: Output FASTA file with rotated monomers.
Contact
For questions or support: ad3002@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.