Skip to main content

Universal Read Analysis of DIMErs

Project description

URAdime

URAdime (Universal Read Analysis of DIMErs) is a Python package for analyzing primer sequences in sequencing data to identify dimers and chimeras.

Installation

pip install uradime

Usage

URAdime can be used both as a command-line tool and as a Python package.

Command Line Interface

# Basic usage
uradime -b input.bam -p primers.tsv -o results/my_analysis

# Full options
uradime \
    -b input.bam \                    # Input BAM file
    -p primers.tsv \                  # Primer file (tab-separated)
    -o results/my_analysis \          # Output prefix
    -t 8 \                            # Number of threads
    -m 1000 \                         # Maximum reads to process (0 for all)
    -c 100 \                          # Chunk size for parallel processing
    -u \                              # Process only unaligned reads
    --max-distance 2 \                # Maximum Levenshtein distance for matching
    --window-size 20 \                # Allowed padding on the 5' ends of the reads, sometime needs to be very big due to universal tails etc. setting this parameter too large can cause unexpected results
    --ignore-amplicon-size \          # Usefull if short read sequecing like Illumina where the paired read length is not the size of the actual amplicon
    --check-termini \                 # Turn off check for partial matches at read termini
    --terminus-length 10 \            #Length of terminus to check for partial matches
    --downsample 5.0 \                # Percentage of reads to randomly sample from the BAM file (0.1-100.0)
    -v                                # Verbose output

Python Package

from uradime import bam_to_fasta_parallel, create_analysis_summary, load_primers

# Load and analyze BAM file
result_df = bam_to_fasta_parallel(
    bam_path="your_file.bam",
    primer_file="primers.tsv",
    num_threads=4
)

# Load primers for analysis
primers_df, _ = load_primers("primers.tsv")

# Create analysis summary
summary_df, matched_pairs, mismatched_pairs = create_analysis_summary(result_df, primers_df)

Input Files

Primer File Format (TSV)

The primer file should be tab-separated with the following columns:

  • Name: Primer pair name
  • Forward: Forward primer sequence
  • Reverse: Reverse primer sequence
  • Size: Expected amplicon size

Example:

Name    Forward             Reverse             Size
Pair1   ATCGATCGATCG       TAGCTAGCTAGC       100
Pair2   GCTAGCTAGCTA       CGATTCGATCGA       150

Output Files

The tool generates several CSV files with the analysis results:

  • *_summary.csv: Overall analysis summary
  • *_matched_pairs.csv: Reads with matching primer pairs
  • *_mismatched_pairs.csv: Reads with mismatched primer pairs
  • *_wrong_size_pairs.csv: Reads with correct primer pairs but wrong size

Requirements

  • Python ≥3.7
  • pysam
  • pandas
  • biopython
  • python-Levenshtein
  • tqdm
  • numpy

License

This project is licensed under GNU GPL.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uradime-0.1.4.tar.gz (104.2 MB view details)

Uploaded Source

Built Distribution

uradime-0.1.4-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file uradime-0.1.4.tar.gz.

File metadata

  • Download URL: uradime-0.1.4.tar.gz
  • Upload date:
  • Size: 104.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for uradime-0.1.4.tar.gz
Algorithm Hash digest
SHA256 b54bbdc1c35bfdc72e8dcc7b799ee68abb8e3eb97defb05a502a7c9b7493f7f2
MD5 53ec5600773b7011fe8b1df0f705edc5
BLAKE2b-256 21c8c2325f1533da4e2d8cef307bb404477084a3bcd605f176a6430913d3b9c6

See more details on using hashes here.

File details

Details for the file uradime-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: uradime-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for uradime-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a27f6109b61865b5f510dcc488c6e0ecae038a04de9aae7e989d260e0a1b21b1
MD5 26e7a03aa7a166302762c8b002194e6c
BLAKE2b-256 c4caa550e7f0839ee7140b9f988bc2b7b08fb9593ae835088cd891c25d92373b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page