A collection of useful methods for working with various bioinformatics data, software output files, etc.
Python module with various submodules tailored to specific tools, analyses, etc. that I perform across projects. Documentation below is only an introduction, most functions are documented with docstrings.
Methods for working with bed files. I make heavy use of pybedtools.
This submodule contains some useful methods and classes for dealing with data through CGHub. I’ll go through the classes and methods.
A GTFuseBam object is a single bam file from CGHub mounted with GTFuse. You can mount and unmount the bam file as you’d like.
This object takes a GTFuseBam object and a set of intervals and obtains the reads from those intervals in the CGHub bam file and writes them to a local bam file.
This is an engine that runs in the background and obtains reads from intervals for a given set of samples. The main process that runs the engine shares the thread with your python session but I use the multiprocessing module to farm out different bam files to different threads so you can obtain reads from multiple bam files simultaneously. This class can be extended.
This engine extends the ReadsFromIntervalsEngine and performs variant calling after obtaining the reads. Currently implemented to work in the Frazer lab computing environment although it would be easy to change for a different computing environment.
Class that wraps the results of a variant calling job for a tumor/normal pair.
This submodule has some methods that are useful for dealing with the output from the RNA-seq expression estimation tool eXpress.
Functions for parsing the Gencode gene annotation into various files that are easier to work with.
Some methods that are generally useful.
Methods for working with MuTect output.
Provides extended functionality on top of pysam.
Useful for tools for working with DNA variants.