Skip to main content

ZSeeker is a cli tool to find the propensity of B-DNA to form Z-DNA structures.

Project description

ZSeeker

==============

Installation

pip install ZSeeker

CLI Usage

ZSeeker --fasta ./test_GCA_f.fasta --n_jobs 1

Example: In Code usage

from zseeker.zdna_calculator import ZDNACalculatorSeq, Params
# Define parameters
params = Params(
    GC_weight=5.0,
    AT_weight=0.5,
    GT_weight=1.1,
    AC_weight=1.3,
    mismatch_penalty_starting_value=4,
    mismatch_penalty_linear_delta=2,
    mismatch_penalty_type='linear',
    threshold=10,
    consecutive_AT_scoring=[1, 2, 2],
    display_sequence_score=1
    drop_threshold=50,
    total_sequence_scoring=False
)

# Create a ZDNACalculatorSeq instance and nput sequence
zdna_calculator = ZDNACalculatorSeq(data="ACGTACGTACGT", params=params)

# Calculate subarrays above threshold
subarrays = zdna_calculator.subarrays_above_threshold()

# Print results
print(subarrays)

Command-line Help

usage: ZSeeker [-h] [--fasta FASTA] [--GC_weight GC_WEIGHT]
               [--AT_weight AT_WEIGHT] [--GT_weight GT_WEIGHT]
               [--AC_weight AC_WEIGHT]
               [--mismatch_penalty_starting_value MISMATCH_PENALTY_STARTING_VALUE]
               [--mismatch_penalty_linear_delta MISMATCH_PENALTY_LINEAR_DELTA]
               [--mismatch_penalty_type {linear,exponential}]
               [--n_jobs N_JOBS] [--threshold THRESHOLD]
               [--consecutive_AT_scoring CONSECUTIVE_AT_SCORING]
               [--display_sequence_score {0,1}]
               [--output_dir OUTPUT_DIR]
               [--gff_file GFF_FILE]
               [--drop_threshold DROP_THRESHOLD]
               [--total_sequence_scoring]

Given a fasta file and the corresponding parameters it calculates the
ZDNA for each sequence present.

options:
  -h, --help            show this help message and exit
  --fasta FASTA         Path to file analyzed
  --GC_weight GC_WEIGHT
                        Weight given to GC and CG transitions.
                        Default = 7.0
  --AT_weight AT_WEIGHT
                        Weight given to AT and TA transitions.
                        Default = 0.5
  --GT_weight GT_WEIGHT
                        Weight given to GT and TG transitions.
                        Default = 1.25
  --AC_weight AC_WEIGHT
                        Weight given to AC and CA transitions.
                        Default = 1.25
  --mismatch_penalty_starting_value MISMATCH_PENALTY_STARTING_VALUE
                        Penalty applied to the first non
                        purine/pyrimidine transition encountered.
                        Default = 3
  --mismatch_penalty_linear_delta MISMATCH_PENALTY_LINEAR_DELTA
                        Only applies if penalty type is set to
                        linear. Determines the rate of increase of
                        the penalty for every subsequent non
                        purine/pyrimidine transition. Default = 3
  --mismatch_penalty_type {linear,exponential}
                        Method of scaling the penalty for contiguous
                        non purine/pyrimidine transition. Default =
                        linear
  --n_jobs N_JOBS       Number of threads to use. Defaults to -1,
                        which uses the maximum available threads on
                        CPU
  --threshold THRESHOLD
                        Scoring threshold for a for a sequence to be
                        considered potentially Z-DNA forming and
                        returned by the program. This parameter is
                        also used for determining how big the scoring
                        drop within a sequence should be, before it
                        is split into two separate Z-DNA candidate
                        sequences. Default=50
  --consecutive_AT_scoring CONSECUTIVE_AT_SCORING
                        Consecutive AT repeats form a hairpin
                        structure instead of Z-DNA. In order to
                        reflect that, a penalty array is defined,
                        which provides the score adjustment for the
                        first and the subsequent TA appearances. The
                        last element will be applied to every
                        subsequent TA appearance. For more
                        information see documentation. Default =
                        (0.5, 0.5, 0.5, 0.5, 0.0, 0.0, -5.0, -100.0)
  --display_sequence_score {0,1}
  --output_dir OUTPUT_DIR
  --gff_file GFF_FILE Optional GFF file for gene annotation. Only 'gene' features are used.
  --drop_threshold DROP_THRESHOLD
                        Drop threshold used within subarrays
                        detection logic. Default = 50.
  --total_sequence_scoring
                        If set, compute only a single
                        transitions-based total score per
                        sequence (one row each). Skips subarray
                        detection altogether.

Example output file

Chromosome Start End Z-DNA Score Sequence
Z1 0.0 15.0 87.0 TGCGTGCGCGCGCGCG
Z2 0.0 15.0 87.0 GCGCCCGCGCGCGCGC
Z3 0.0 11.0 71.0 GCGCGCGCGCGT
Z4 0.0 11.0 65.0 GCGCGTGCGCGC
Z5 0.0 10.0 70.0 CGCGCGCGCGC
Z6 0.0 15.0 63.0 GCACGCACACGCGCGT
Z7 0.0 10.0 70.0 GCGCGCGCGCG
Z8 0.0 13.0 61.0 CGCACGCGCACGCA
Z9 0.0 11.0 59.0 CGCGCGCGCACA

Example output file with annotations

Chromosome Start End Z-DNA Score Sequence gene_start gene_end gene_id gene_biotype strand distance distance_from_TSS distance_from_TES
AE004438.1 364 391 63.0 ACGGTGCCGCAGCGGCCGTGTCGCCAGC 362 812 gene-VNG_6001H protein_coding - 0 420 2
AE004438.1 2317 2335 51.5 GCGGCGAGTCGCCGTCGCG 1904 3719 gene-VNG_6007H protein_coding - 0 1383 413
AE004438.1 3528 3538 52.75 ACGTGCGCGCG 1904 3719 gene-VNG_6007H protein_coding - 0 180 1624
AE004438.1 12771 12814 109.25 GCTGTCGCTGTCGGCGGCGGCTGCCGCCGACGCGACAGCGTCGC 12846 13380 gene-VNG_6015H protein_coding - 32 565 32
AE004438.1 13178 13195 56.0 ACGGCGCGTCAGCGGCGT 12846 13380 gene-VNG_6015H protein_coding - 0 184 332
AE004438.1 13533 13552 52.75 ACGGCGCACCGCCAGCGTGT 12846 13380 gene-VNG_6015H protein_coding - 153 154 687
AE004438.1 13853 13872 70.0 CGTCGGCGCACGCGCCGACG 14307 15582 gene-VNG_6016H protein_coding + 435 435 1709
AE004438.1 14960 14971 51.25 GCGCGGTCGCGC 14307 15582 gene-VNG_6016H protein_coding + 0 653 610
AE004438.1 15105 15126 61.0 CGCGTCGTCGGCGTCCGCGACG 14307 15582 gene-VNG_6016H protein_coding + 0 798 455

ZSeeker web application

The web version of ZSeeker can be found at: ZSeeker web application

And a dockerized version of it can be found at this repository for local deployments: ZSeeker web application dockerized

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zseeker-1.7.tar.gz (25.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ZSeeker-1.7-py3-none-any.whl (26.2 kB view details)

Uploaded Python 3

File details

Details for the file zseeker-1.7.tar.gz.

File metadata

  • Download URL: zseeker-1.7.tar.gz
  • Upload date:
  • Size: 25.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.15

File hashes

Hashes for zseeker-1.7.tar.gz
Algorithm Hash digest
SHA256 49ff68784a5d2ffb37f0f0776854d20e85f86a359c18f23715ebc543153f1c6e
MD5 22e8fbf9f1b0226033828e505fbd6d35
BLAKE2b-256 800d611dfb00c14d2ecb4d095aab0c1cfb7cbd4fe5c8500183c23fe24afccede

See more details on using hashes here.

File details

Details for the file ZSeeker-1.7-py3-none-any.whl.

File metadata

  • Download URL: ZSeeker-1.7-py3-none-any.whl
  • Upload date:
  • Size: 26.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.15

File hashes

Hashes for ZSeeker-1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 49fa60230025f1f7e76930e505db95ddba009406bdbaf5f6500d7200a8637fd9
MD5 971fd1a6e743c0843028de34c5e87df4
BLAKE2b-256 159ae3823ebdde96d4e3f975ade9197e700b07f88bc7f7e5f5b12b95d2d394fb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page