Skip to main content

simple functions for manipulating sequences and secondary structures in pandas dataframe format

Project description

seq_tools

PyPI version Python 3.9+ Tests Code style: black License

A Python package for manipulating and analyzing nucleic acid sequences (DNA and RNA) in pandas DataFrames.

Features

  • Batch operations: Work with sequences in pandas DataFrames for efficient processing
  • Sequence manipulation: Convert between DNA/RNA, reverse complement, add sequences
  • Structure prediction: Fold RNA sequences using ViennaRNA
  • Analysis tools: Calculate molecular weights, extinction coefficients, edit distances
  • CLI interface: Command-line tools for quick sequence operations
  • Python API: Full programmatic access to all functionality

Installation

pip install rna_seq_tools

Quick Start

Command Line Interface

# Get help
seq-tools --help

# Convert RNA to DNA
seq-tools to-dna "AUCG"

# Fold RNA sequence
seq-tools fold "GGGGUUUUCCCC"

# Calculate molecular weight
seq-tools mw "ATCG"

Python API

import pandas as pd
from seq_tools import sequences_to_dataframe, fold, get_molecular_weight_df, to_rna_df

# Create a DataFrame from sequences
sequences = ["ATCG", "GCTA", "AAAA"]
df = sequences_to_dataframe(sequences)

# Convert to RNA
df = to_rna_df(df)

# Fold RNA sequences
df = fold(df)

# Calculate molecular weights
df = get_molecular_weight_df(df, "RNA", double_stranded=False)

print(df)

Single Sequence Functions

For single sequence operations, import from the sequence module:

from seq_tools.sequence import to_dna, to_rna, get_reverse_complement, get_molecular_weight

# Convert sequences
rna_seq = to_rna("ATCG")  # Returns "AUCG"
dna_seq = to_dna("AUCG")  # Returns "ATCG"

# Reverse complement
rc = get_reverse_complement("ATCG", "DNA")  # Returns "CGAT"

# Molecular weight
mw = get_molecular_weight("ATCG", "DNA")  # Returns 1307.80

CLI Commands

add

Add a sequence to the 5' and/or 3' end of sequences.

seq-tools add -p5 "AAAA" "GGGGUUUUCCCC"
seq-tools add -p5 "AAAA" -p3 "CCCC" input.csv

ec

Calculate the extinction coefficient for each sequence.

seq-tools ec "GGGGUUUUCCCC"
seq-tools ec input.csv -nt RNA -ds  # RNA, double-stranded

edit-distance

Calculate the average edit distance of a sequence library.

seq-tools edit-distance input.csv
seq-tools edit-distance input.csv --parallel --workers 4

fold

Fold RNA sequences using ViennaRNA.

seq-tools fold "GGGGUUUUCCCC"
seq-tools fold input.csv

mw

Calculate the molecular weight for each sequence.

seq-tools mw "ATCG"
seq-tools mw input.csv -nt DNA -ds  # DNA, double-stranded

rc

Calculate reverse complement for each sequence.

seq-tools rc "ATCG"
seq-tools rc input.csv -nt DNA

to-dna

Convert RNA sequences to DNA (replace U with T).

seq-tools to-dna "AUCG"
seq-tools to-dna input.csv -o output.csv

to-dna-template

Convert RNA sequences to DNA template with T7 promoter.

seq-tools to-dna-template "AUCG"
seq-tools to-dna-template input.csv

to-rna

Convert DNA sequences to RNA (replace T with U).

seq-tools to-rna "ATCG"
seq-tools to-rna input.csv

transcribe

Transcribe DNA template sequences to RNA (removes T7 promoter).

seq-tools transcribe input.csv

trim

Trim 5'/3' ends of sequences.

seq-tools trim input.csv --start 5 --end 3

to-fasta

Generate FASTA file from CSV.

seq-tools to-fasta input.csv output.fasta

to-opool

Generate oligo pool file (Excel) from CSV.

seq-tools to-opool input.csv "pool_name" output.xlsx

DataFrame Functions

The package provides comprehensive DataFrame operations:

  • Conversion: to_dna_df(), to_rna_df(), to_dna_template_df()
  • Analysis: get_molecular_weight_df(), get_extinction_coeff(), get_length()
  • Structure: fold() - predict RNA secondary structures
  • Manipulation: add(), trim(), get_reverse_complement_df()
  • Generation: generate_random_sequences(), generate_mutated_sequences()
  • Validation: has_t7_promoter(), has_5p_sequence(), has_3p_sequence()
  • File I/O: to_fasta(), to_opool()

See the notebooks directory for detailed examples.

Requirements

  • Python 3.9+
  • pandas
  • numpy
  • ViennaRNA (for structure prediction)
  • editdistance
  • click
  • tabulate

Tutorial Notebooks

Interactive Jupyter notebooks are available in the notebooks/ directory:

  • 01_introduction.ipynb: Package overview and quick start
  • 02_sequence_operations.ipynb: Working with individual sequences
  • 03_structure_analysis.ipynb: RNA folding and structure analysis
  • 04_dataframe_operations.ipynb: Batch processing with DataFrames
  • 05_advanced_features.ipynb: Advanced features and workflows

See the notebooks README for more details.

Development

# Clone the repository
git clone https://github.com/jyesselm/seq_tools.git
cd seq_tools

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in editable mode
pip install -e .

# Run tests
pytest test/ -v

License

This project is licensed under a Non-Commercial License. Commercial use is prohibited. See LICENSE file for details.

For commercial licensing inquiries, please contact jyesselm@unl.edu.

Author

Joe Yesselman - jyesselm@unl.edu

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rna_seq_tools-0.9.0.tar.gz (569.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rna_seq_tools-0.9.0-py3-none-any.whl (25.6 kB view details)

Uploaded Python 3

File details

Details for the file rna_seq_tools-0.9.0.tar.gz.

File metadata

  • Download URL: rna_seq_tools-0.9.0.tar.gz
  • Upload date:
  • Size: 569.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for rna_seq_tools-0.9.0.tar.gz
Algorithm Hash digest
SHA256 7910b6219066e5850dc84a40bb11446d102fd8b67b0fe75e0babd5164a817416
MD5 d7e5f07d69d4fc20911eeebdefa19eb3
BLAKE2b-256 1c05dee0cd38b44547a1f8368b56296fbc2dc3afe6d83d8b23ef9ee24155d1d4

See more details on using hashes here.

File details

Details for the file rna_seq_tools-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: rna_seq_tools-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 25.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for rna_seq_tools-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c8abc1adef6c9751c9d604b30b41d85113c5086ded3b173bc57d54618ce542c9
MD5 75467d7254baf89126a87c4dda04c02a
BLAKE2b-256 71a0ef1a82a3871c0bb032ee99d82a31aee31007b068e6b5161081c5fa78be46

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page