De Novo Decomposition of Satellite DNA Arrays into Monomers within Telomere-to-Telomere Assemblies

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Python
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

ArraySplitter: De Novo Decomposition of Satellite DNA Arrays

Decomposes satellite DNA arrays into monomers within telomere-to-telomere (T2T) assemblies. Ideal for analyzing centromeric and pericentromeric regions on monomeric level.

Status: In development. Optimized for 100Kb scale arrays; longer arrays will work but may take longer to process. Signigicanlty longer time.

Update: From 1.1.6, ArraySplitter now successfully decomposes arrays on megabase scale. Largest arrays takes around 5 minutes to process. Fortunatelly, there are only 41 arrays large 1 Mb in CHM13v20 assembly. And I'm going to add parallel processing to speed up singificantly the process. Currently, it is single-threaded.

Update: Monomers are required some polising of borders, I am working on it.

Update: To test ArraySplitter, I used CHM13v20 assembly, it takes around 3 hours, to decompose all arrays longer than 1 Kb (13K arrays).

Installation

Prerequisites

Python 3.6 or later

Installation with pip:

pip install arraysplitter

Usage

Basic Example

time arraysplitter -i chr1.arrays.fa -o chr1.arrays

It will create a FASTA file with monomers separated by spaces.

Explanation

-i chr1.arrays.fa: FASTA file of satDNA arrays.
-o chr1.arrays: Prefix for the output FASTA containing decomposed monomers (separated by spaces).

All Options

arraysplitter --help

Rotating monomers to start with the same sequence

We found that different arrays of the same repeat family can be decomposed sligtly differently. To make them comparable, ArraySplitter can rotate monomers to start with the same sequence.

arraysplitter_rotate -i arrays.fa -o arrays.norm.fa

And you can give the sequence to start with:

arraysplitter_rotate -i arrays.fa -o arrays.norm.fa -s TTTC

Explanation

-i arrays.fa: FASTA file of monomers.
-o arrays.norm.fa: Output FASTA file with rotated monomers.

Extracting and counting monomers

And finally, you can extract and count monomers from the arrays:

arraysplitter_extract -i arrays.norm.fa -o arrays.norm

It will create a file with monomer length, monomer frequency, and monomer sequence (ordered by frequency). For example, for the arrays.norm.fa file above, the output will be like this:

514     10      ATCCCATTCC
514     10      GATTGGAGTG
514     6       TCCTTT
514     5       TGCTG
514     10      ATTGAATGGA
514     10      ATGCAATGGA
514     5       TCCTA

Contact

For questions or support: ad3002@gmail.com

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Python
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

This version

1.2.3

Mar 6, 2024

1.2.2

Mar 6, 2024

1.2.1

Mar 6, 2024

1.1.13

Mar 6, 2024

1.1.12

Feb 22, 2024

1.1.11

Feb 21, 2024

1.1.10

Feb 21, 2024

1.1.9

Feb 21, 2024

1.1.7

Feb 21, 2024

1.1.6

Feb 21, 2024

1.1.5

Feb 21, 2024

1.1.4

Feb 21, 2024

1.1.3

Feb 21, 2024

1.1.2

Feb 21, 2024

1.1.1

Feb 21, 2024

1.1.0

Feb 21, 2024

1.0.9

Feb 21, 2024

1.0.8

Feb 21, 2024

1.0.7

Feb 21, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ArraySplitter-1.2.3.tar.gz (18.3 kB view hashes)

Uploaded Mar 6, 2024 Source

Hashes for ArraySplitter-1.2.3.tar.gz

Hashes for ArraySplitter-1.2.3.tar.gz
Algorithm	Hash digest
SHA256	`1887ba979c8800e0d8b842e0ad425aeafd520084e5b22a3e715d90bd451cd36f`
MD5	`72dc54b28b29bd173106b7ccac2e0b56`
BLAKE2b-256	`e6783f734a9f91765c6d5e02f2eb6a4f83e3096438de9b8f8a2c2eeb2f157159`