Skip to main content

simple functions for manipulating sequences and secondary structures in pandas dataframe format

Project description

seq_tools

PYPI package linting: pylint formatting: black

a short python tool for working with sequences in dataframes

how to install

pip install rna_seq_tools

how to use

seq_tools is a python package that contains a few functions for working with sequences in dataframes. If there is a single sequence results are printed. If input is a csv then a new csv is created with the results. Default output is "output.csv" but can be changed with the -o flag.

$ seq_tools --help
Usage: seq_tools [OPTIONS] COMMAND [ARGS]...

  a set scripts to manipulate sequences in csv files

Options:
  --help  Show this message and exit.

Commands:
  add              add a sequence to 5' and/or 3'
  ec               calculate the extinction coefficient for each sequence
  edit-distance    calculate the edit distance of a library
  fold             fold rna sequences
  mw               calculate the molecular weight for each sequence
  rc               calculate reverse complement for each sequence
  to-dna           convert rna sequence(s) to dna
  to-dna-template  convert rna sequence(s) to dna template, includes T7...
  to-fasta         generate fasta file from csv
  to-opool         generate oligo pool file from csv
  to-rna           convert rna sequence(s) to dna
  transcribe       convert dna sequence(s) to rna
  trim             trim 5'/3' ends of sequences

add

Adds a sequence to the 5' and/or 3' end of a sequence.

$ seq_tools add -p5 "AAAA" "GGGGUUUUCCCC"
SEQ_TOOLS.get_input_dataframe - INFO - reading sequence GGGGUUUUCCCC
SEQ_TOOLS.handle_output - INFO - output->
name                     seq
sequence    AAAAGGGGUUUUCCCC
Name: 0, dtype: object

ec

Calculate the extinction coefficient for each sequence.

$ seq-tools ec "GGGGUUUUCCCC"
SEQ_TOOLS.get_input_dataframe - INFO - reading sequence GGGGUUUUCCCC
SEQ_TOOLS.handle_ntype - INFO - determining nucleic acid type: RNA
SEQ_TOOLS.handle_output - INFO - output->
name                         seq
sequence            GGGGUUUUCCCC
extinction_coeff          109500
Name: 0, dtype: object

edit-distance

Calculate the edit distance of a library. On average how different each sequence is from the rest of the library.

seq-tools edit-distance test/resources/test.csv
SEQ_TOOLS.edit_distance - INFO - edit distance: 17.666666666666668

fold

Fold rna sequences.

$ seq-tools fold "GGGGUUUUCCCC"
SEQ_TOOLS.get_input_dataframe - INFO - reading sequence GGGGUUUUCCCC
SEQ_TOOLS.handle_output - INFO - output->
name                   seq
sequence      GGGGUUUUCCCC
structure     ((((....))))
mfe                   -5.9
ens_defect            0.38
Name: 0, dtype: object

to-dna

Convert all sequences to DNA i.e. replace T with U.

$ seq_tools to-dna "GGGGUUUUCCCC"
SEQ_TOOLS.get_input_dataframe - INFO - reading sequence GGGGUUUUCCCC
SEQ_TOOLS.to_dna - INFO - converted sequence: GGGGTTTTCCCC

other non commandline

structure representation

from seq_tools import SequenceStructure
struct = SequenceStructure("GGGGUUUUCCCC", "((((....))))")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rna_seq_tools-0.7.1.tar.gz (15.8 kB view details)

Uploaded Source

Built Distribution

rna_seq_tools-0.7.1-py2.py3-none-any.whl (13.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file rna_seq_tools-0.7.1.tar.gz.

File metadata

  • Download URL: rna_seq_tools-0.7.1.tar.gz
  • Upload date:
  • Size: 15.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for rna_seq_tools-0.7.1.tar.gz
Algorithm Hash digest
SHA256 db49e852310e4d51c62f585d828788913734f00f3ca819b44179eb6c36b4e4db
MD5 263b00968ed6dd3da70796be1ebc046f
BLAKE2b-256 e6385f2491106012251584cc4ec75a6e09a34d923bccbb1d143b58fd62b243e2

See more details on using hashes here.

File details

Details for the file rna_seq_tools-0.7.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for rna_seq_tools-0.7.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 59da280274fa37e505f6a798cf2228f19d78946eb9624041b513267b495b2c6e
MD5 4556b82a7765276454b99f3a1c3cb8e1
BLAKE2b-256 67cc1ac1f9e7312a434f609bac0afcda754e598f5d4c742d3ee74a661d2c676d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page