Skip to main content

ZSeeker is a cli tool to find the propensity of B-DNA to form Z-DNA structures.

Project description

ZDNA tool

Installation

pip install ZSeeker

CLI Usage

ZSeeker --fasta ./test_GCA_f.fasta --n_jobs 1

Example: In Code usage

from zseeker.zdna_calculator import ZDNACalculatorSeq, Params
# Define parameters
params = Params(
    GC_weight=5.0,
    AT_weight=0.5,
    GT_weight=1.1,
    AC_weight=1.3,
    mismatch_penalty_starting_value=4,
    mismatch_penalty_linear_delta=2,
    mismatch_penalty_type='linear',
    threshold=10,
    consecutive_AT_scoring=[1, 2, 2],
    display_sequence_score=1
)

# Create a ZDNACalculatorSeq instance and nput sequence
zdna_calculator = ZDNACalculatorSeq(data="ACGTACGTACGT", params=params)

# Calculate subarrays above threshold
subarrays = zdna_calculator.subarrays_above_threshold()

# Print results
print(subarrays)

Command-line Help

usage: ZSeeker [-h] [--fasta FASTA] [--GC_weight GC_WEIGHT]
               [--AT_weight AT_WEIGHT] [--GT_weight GT_WEIGHT]
               [--AC_weight AC_WEIGHT]
               [--mismatch_penalty_starting_value MISMATCH_PENALTY_STARTING_VALUE]
               [--mismatch_penalty_linear_delta MISMATCH_PENALTY_LINEAR_DELTA]
               [--mismatch_penalty_type {linear,exponential}]
               [--n_jobs N_JOBS] [--threshold THRESHOLD]
               [--consecutive_AT_scoring CONSECUTIVE_AT_SCORING]
               [--display_sequence_score {0,1}]
               [--output_dir OUTPUT_DIR]
               [--gff_file GFF_FILE]

Given a fasta file and the corresponding parameters it calculates the
ZDNA for each sequence present.

options:
  -h, --help            show this help message and exit
  --fasta FASTA         Path to file analyzed
  --GC_weight GC_WEIGHT
                        Weight given to GC and CG transitions.
                        Default = 7.0
  --AT_weight AT_WEIGHT
                        Weight given to AT and TA transitions.
                        Default = 0.5
  --GT_weight GT_WEIGHT
                        Weight given to GT and TG transitions.
                        Default = 1.25
  --AC_weight AC_WEIGHT
                        Weight given to AC and CA transitions.
                        Default = 1.25
  --mismatch_penalty_starting_value MISMATCH_PENALTY_STARTING_VALUE
                        Penalty applied to the first non
                        purine/pyrimidine transition encountered.
                        Default = 3
  --mismatch_penalty_linear_delta MISMATCH_PENALTY_LINEAR_DELTA
                        Only applies if penalty type is set to
                        linear. Determines the rate of increase of
                        the penalty for every subsequent non
                        purine/pyrimidine transition. Default = 3
  --mismatch_penalty_type {linear,exponential}
                        Method of scaling the penalty for contiguous
                        non purine/pyrimidine transition. Default =
                        linear
  --n_jobs N_JOBS       Number of threads to use. Defaults to -1,
                        which uses the maximum available threads on
                        CPU
  --threshold THRESHOLD
                        Scoring threshold for a for a sequence to be
                        considered potentially Z-DNA forming and
                        returned by the program. This parameter is
                        also used for determining how big the scoring
                        drop within a sequence should be, before it
                        is split into two separate Z-DNA candidate
                        sequences. Default=50
  --consecutive_AT_scoring CONSECUTIVE_AT_SCORING
                        Consecutive AT repeats form a hairpin
                        structure instead of Z-DNA. In order to
                        reflect that, a penalty array is defined,
                        which provides the score adjustment for the
                        first and the subsequent TA appearances. The
                        last element will be applied to every
                        subsequent TA appearance. For more
                        information see documentation. Default =
                        (0.5, 0.5, 0.5, 0.5, 0.0, 0.0, -5.0, -100.0)
  --display_sequence_score {0,1}
  --output_dir OUTPUT_DIR
  --gff Optional GFF file for gene annotation. Only 'gene' features are used.

Example output file

Chromosome Start End Z-DNA Score Sequence
Z1 0.0 15.0 87.0 TGCGTGCGCGCGCGCG
Z2 0.0 15.0 87.0 GCGCCCGCGCGCGCGC
Z3 0.0 11.0 71.0 GCGCGCGCGCGT
Z4 0.0 11.0 65.0 GCGCGTGCGCGC
Z5 0.0 10.0 70.0 CGCGCGCGCGC
Z6 0.0 15.0 63.0 GCACGCACACGCGCGT
Z7 0.0 10.0 70.0 GCGCGCGCGCG
Z8 0.0 13.0 61.0 CGCACGCGCACGCA
Z9 0.0 11.0 59.0 CGCGCGCGCACA

Example output file with annotations

Chromosome Start End Z-DNA Score Sequence gene_start gene_end gene_id gene_biotype strand distance distance_from_TSS distance_from_TES
AE004438.1 364 391 63.0 ACGGTGCCGCAGCGGCCGTGTCGCCAGC 362 812 gene-VNG_6001H protein_coding - 0 420 2
AE004438.1 2317 2335 51.5 GCGGCGAGTCGCCGTCGCG 1904 3719 gene-VNG_6007H protein_coding - 0 1383 413
AE004438.1 3528 3538 52.75 ACGTGCGCGCG 1904 3719 gene-VNG_6007H protein_coding - 0 180 1624
AE004438.1 12771 12814 109.25 GCTGTCGCTGTCGGCGGCGGCTGCCGCCGACGCGACAGCGTCGC 12846 13380 gene-VNG_6015H protein_coding - 32 565 32
AE004438.1 13178 13195 56.0 ACGGCGCGTCAGCGGCGT 12846 13380 gene-VNG_6015H protein_coding - 0 184 332
AE004438.1 13533 13552 52.75 ACGGCGCACCGCCAGCGTGT 12846 13380 gene-VNG_6015H protein_coding - 153 154 687
AE004438.1 13853 13872 70.0 CGTCGGCGCACGCGCCGACG 14307 15582 gene-VNG_6016H protein_coding + 435 435 1709
AE004438.1 14960 14971 51.25 GCGCGGTCGCGC 14307 15582 gene-VNG_6016H protein_coding + 0 653 610
AE004438.1 15105 15126 61.0 CGCGTCGTCGGCGTCCGCGACG 14307 15582 gene-VNG_6016H protein_coding + 0 798 455

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zseeker-1.6.tar.gz (26.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ZSeeker-1.6-py3-none-any.whl (27.1 kB view details)

Uploaded Python 3

File details

Details for the file zseeker-1.6.tar.gz.

File metadata

  • Download URL: zseeker-1.6.tar.gz
  • Upload date:
  • Size: 26.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.15

File hashes

Hashes for zseeker-1.6.tar.gz
Algorithm Hash digest
SHA256 c68f513ed9d2eff27ee70c7f42305a598087916e2ffc5340b8660c630208ae92
MD5 ad839fdb285471314187a2464b8dc254
BLAKE2b-256 7754d1b8620258c021d202d6d2086af1dfbc7a66dbbb0ce1a8b8c75c3ca84324

See more details on using hashes here.

File details

Details for the file ZSeeker-1.6-py3-none-any.whl.

File metadata

  • Download URL: ZSeeker-1.6-py3-none-any.whl
  • Upload date:
  • Size: 27.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.15

File hashes

Hashes for ZSeeker-1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 e3bbe4d0765fb08288a635c42878485385b9f7d36725904ae60da68cd89e6dac
MD5 fa556eb939bfd785cd7288cb5f3e4783
BLAKE2b-256 ebe8a79477eb29fd7f4f7f8368d34b1ee90140495b04c977a0c857600c66c410

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page