simple functions for manipulating sequences and secondary structures in pandas dataframe format
Project description
seq_tools
A Python package for manipulating and analyzing nucleic acid sequences (DNA and RNA) in pandas DataFrames.
Features
- Batch operations: Work with sequences in pandas DataFrames for efficient processing
- Sequence manipulation: Convert between DNA/RNA, reverse complement, add sequences
- Structure prediction: Fold RNA sequences using ViennaRNA
- Analysis tools: Calculate molecular weights, extinction coefficients, edit distances
- CLI interface: Command-line tools for quick sequence operations
- Python API: Full programmatic access to all functionality
Installation
pip install rna_seq_tools
Quick Start
Command Line Interface
# Get help
seq-tools --help
# Convert RNA to DNA
seq-tools to-dna "AUCG"
# Fold RNA sequence
seq-tools fold "GGGGUUUUCCCC"
# Calculate molecular weight
seq-tools mw "ATCG"
Python API
import pandas as pd
from seq_tools import sequences_to_dataframe, fold, get_molecular_weight_df, to_rna_df
# Create a DataFrame from sequences
sequences = ["ATCG", "GCTA", "AAAA"]
df = sequences_to_dataframe(sequences)
# Convert to RNA
df = to_rna_df(df)
# Fold RNA sequences
df = fold(df)
# Calculate molecular weights
df = get_molecular_weight_df(df, "RNA", double_stranded=False)
print(df)
Single Sequence Functions
For single sequence operations, import from the sequence module:
from seq_tools.sequence import to_dna, to_rna, get_reverse_complement, get_molecular_weight
# Convert sequences
rna_seq = to_rna("ATCG") # Returns "AUCG"
dna_seq = to_dna("AUCG") # Returns "ATCG"
# Reverse complement
rc = get_reverse_complement("ATCG", "DNA") # Returns "CGAT"
# Molecular weight
mw = get_molecular_weight("ATCG", "DNA") # Returns 1307.80
CLI Commands
add
Add a sequence to the 5' and/or 3' end of sequences.
seq-tools add -p5 "AAAA" "GGGGUUUUCCCC"
seq-tools add -p5 "AAAA" -p3 "CCCC" input.csv
ec
Calculate the extinction coefficient for each sequence.
seq-tools ec "GGGGUUUUCCCC"
seq-tools ec input.csv -nt RNA -ds # RNA, double-stranded
edit-distance
Calculate the average edit distance of a sequence library.
seq-tools edit-distance input.csv
seq-tools edit-distance input.csv --parallel --workers 4
fold
Fold RNA sequences using ViennaRNA.
seq-tools fold "GGGGUUUUCCCC"
seq-tools fold input.csv
mw
Calculate the molecular weight for each sequence.
seq-tools mw "ATCG"
seq-tools mw input.csv -nt DNA -ds # DNA, double-stranded
rc
Calculate reverse complement for each sequence.
seq-tools rc "ATCG"
seq-tools rc input.csv -nt DNA
to-dna
Convert RNA sequences to DNA (replace U with T).
seq-tools to-dna "AUCG"
seq-tools to-dna input.csv -o output.csv
to-dna-template
Convert RNA sequences to DNA template with T7 promoter.
seq-tools to-dna-template "AUCG"
seq-tools to-dna-template input.csv
to-rna
Convert DNA sequences to RNA (replace T with U).
seq-tools to-rna "ATCG"
seq-tools to-rna input.csv
transcribe
Transcribe DNA template sequences to RNA (removes T7 promoter).
seq-tools transcribe input.csv
trim
Trim 5'/3' ends of sequences.
seq-tools trim input.csv --start 5 --end 3
to-fasta
Generate FASTA file from CSV.
seq-tools to-fasta input.csv output.fasta
to-opool
Generate oligo pool file (Excel) from CSV.
seq-tools to-opool input.csv "pool_name" output.xlsx
DataFrame Functions
The package provides comprehensive DataFrame operations:
- Conversion:
to_dna_df(),to_rna_df(),to_dna_template_df() - Analysis:
get_molecular_weight_df(),get_extinction_coeff(),get_length() - Structure:
fold()- predict RNA secondary structures - Manipulation:
add(),trim(),get_reverse_complement_df() - Generation:
generate_random_sequences(),generate_mutated_sequences() - Validation:
has_t7_promoter(),has_5p_sequence(),has_3p_sequence() - File I/O:
to_fasta(),to_opool()
See the notebooks directory for detailed examples.
Requirements
- Python 3.9+
- pandas
- numpy
- ViennaRNA (for structure prediction)
- editdistance
- click
- tabulate
Tutorial Notebooks
Interactive Jupyter notebooks are available in the notebooks/ directory:
- 01_introduction.ipynb: Package overview and quick start
- 02_sequence_operations.ipynb: Working with individual sequences
- 03_structure_analysis.ipynb: RNA folding and structure analysis
- 04_dataframe_operations.ipynb: Batch processing with DataFrames
- 05_advanced_features.ipynb: Advanced features and workflows
See the notebooks README for more details.
Development
# Clone the repository
git clone https://github.com/jyesselm/seq_tools.git
cd seq_tools
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in editable mode
pip install -e .
# Run tests
pytest test/ -v
License
This project is licensed under a Non-Commercial License. Commercial use is prohibited. See LICENSE file for details.
For commercial licensing inquiries, please contact jyesselm@unl.edu.
Author
Joe Yesselman - jyesselm@unl.edu
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rna_seq_tools-0.7.2.tar.gz.
File metadata
- Download URL: rna_seq_tools-0.7.2.tar.gz
- Upload date:
- Size: 565.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3213a03a73f91ab043233e085acbeae9285725d478fb5c992c956117373cf04f
|
|
| MD5 |
741bdc8c55a75cecc55d7b36898f23b1
|
|
| BLAKE2b-256 |
b611869b4cdc5811e8648003ff016f892b8f8e8f5684b49de82609fc3085d7b5
|
File details
Details for the file rna_seq_tools-0.7.2-py3-none-any.whl.
File metadata
- Download URL: rna_seq_tools-0.7.2-py3-none-any.whl
- Upload date:
- Size: 22.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ad1d7b500ff1fdf6edc5da57656b3e7b51cb0b055f07b23ac58720b054cb9cb
|
|
| MD5 |
4cbcd31d9f11c1000b0d7cfaa3321b76
|
|
| BLAKE2b-256 |
206bc31181527dd4473c473048b66062f01f7bb2caf7e724347a7892365081e5
|