Skip to main content

simple functions for manipulating sequences and secondary structures in pandas dataframe format

Project description

seq_tools

PYPI package linting: pylint formatting: black

a short python tool for working with sequences in dataframes

how to install

pip install rna_seq_tools

how to use

seq_tools is a python package that contains a few functions for working with sequences in dataframes. If there is a single sequence results are printed. If input is a csv then a new csv is created with the results. Default output is "output.csv" but can be changed with the -o flag.

$ seq_tools --help
Usage: seq_tools [OPTIONS] COMMAND [ARGS]...

  a set scripts to manipulate sequences in csv files

Options:
  --help  Show this message and exit.

Commands:
  add              add a sequence to 5' and/or 3'
  ec               calculate the extinction coefficient for each sequence
  edit-distance    calculate the edit distance of a library
  fold             fold rna sequences
  mw               calculate the molecular weight for each sequence
  rc               calculate reverse complement for each sequence
  to-dna           convert rna sequence(s) to dna
  to-dna-template  convert rna sequence(s) to dna template, includes T7...
  to-fasta         generate fasta file from csv
  to-opool         generate oligo pool file from csv
  to-rna           convert rna sequence(s) to dna
  transcribe       convert dna sequence(s) to rna
  trim             trim 5'/3' ends of sequences

add

Adds a sequence to the 5' and/or 3' end of a sequence.

$ seq_tools add -p5 "AAAA" "GGGGUUUUCCCC"
SEQ_TOOLS.get_input_dataframe - INFO - reading sequence GGGGUUUUCCCC
SEQ_TOOLS.handle_output - INFO - output->
name                     seq
sequence    AAAAGGGGUUUUCCCC
Name: 0, dtype: object

ec

Calculate the extinction coefficient for each sequence.

$ seq-tools ec "GGGGUUUUCCCC"
SEQ_TOOLS.get_input_dataframe - INFO - reading sequence GGGGUUUUCCCC
SEQ_TOOLS.handle_ntype - INFO - determining nucleic acid type: RNA
SEQ_TOOLS.handle_output - INFO - output->
name                         seq
sequence            GGGGUUUUCCCC
extinction_coeff          109500
Name: 0, dtype: object

edit-distance

Calculate the edit distance of a library. On average how different each sequence is from the rest of the library.

seq-tools edit-distance test/resources/test.csv
SEQ_TOOLS.edit_distance - INFO - edit distance: 17.666666666666668

fold

Fold rna sequences.

$ seq-tools fold "GGGGUUUUCCCC"
SEQ_TOOLS.get_input_dataframe - INFO - reading sequence GGGGUUUUCCCC
SEQ_TOOLS.handle_output - INFO - output->
name                   seq
sequence      GGGGUUUUCCCC
structure     ((((....))))
mfe                   -5.9
ens_defect            0.38
Name: 0, dtype: object

to-dna

Convert all sequences to DNA i.e. replace T with U.

$ seq_tools to-dna "GGGGUUUUCCCC"
SEQ_TOOLS.get_input_dataframe - INFO - reading sequence GGGGUUUUCCCC
SEQ_TOOLS.to_dna - INFO - converted sequence: GGGGTTTTCCCC

other non commandline

structure representation

from seq_tools import SequenceStructure
struct = SequenceStructure("GGGGUUUUCCCC", "((((....))))")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rna_seq_tools-0.7.1.tar.gz (15.8 kB view hashes)

Uploaded Source

Built Distribution

rna_seq_tools-0.7.1-py2.py3-none-any.whl (13.7 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page