Skip to main content
Join the official 2019 Python Developers SurveyStart the survey!

My python library of classes and functions that help me work

Project description

`|Build Status| <>`_


This is my personal library of python classes and functions, many of
them have bioinformatics applications. The library changes constantly
and at a whim. If you want to use it, approach with caution. Over time
however, parts appear to be settling on a stable configuration.



A pure Python implementation of the Smith-Waterman local alignment


A C++ and pure Python implementation of sequence generation algorithm.
The generated sequence will have a specified dinucleotide frequency.


An implementation of intervals and points for genomic coordinates.
Useful for representing gene models.


A class to read genetic codes and translate DNA sequences into protein


A class to convert protein names between the one and three letter codes
and the full name.


A class that calculates k-mers for a given sequence. The class behaves
likea dict, but calculates new k-mers on the fly.


A class that calculates skews for a given sequence. The class behaves
like a dict, but calculates new skews on the fly.


Several collections mostly for holding intervals. If only intervals need
to be held, use the IntervalTree, otherwise the MultiDimensionMap may be
more appropriate.


Classes for working with files


A pure Python implementation of graphs


Intended to be my own code for indexing files but is still very unstable
an immature


A class for intervals and interval operations

Classes for parsing and working with several file formats


Classes for working with iterators

Various classes, mostly unused and out-of-date



An implementation of the reservoir sampling algorithm. Can also be run
from the command line to sample lines from files. To sample 50 lines
from a file called input\_file.txt, run:


python -m lhc.random.reservoir input_file.txt 50


Really old code. Probably the NIPALS and PCA algorithms are of most use.


Unit tests! These should be mostly up-to-date now.


A sorter for very large iterators. The iterator will be split into
chunks which are then sorted individually and then merged into a single


A basic tokeniser. Users define which characters belong to which classes
and the tokeniser will split strings into substrings where all
characters have the same type.


>>> tokeniser = Tokeniser({'word': 'abcdefghijklmnopqrstuvwxyz',
'number': '0123456789',
'space': ' \t'})
>>> tokens = tokeniser.tokenise('there were 1000 bottles on the wall')
Token(type='word', value='there')
Token(type='space', value=' ')
Token(type='word', value='were')
Token(type='space', value=' ')
Token(type='number', value='1000')

.. |Build
Status| image::

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for lhc-python, version 2.0.3
Filename, size File type Python version Upload date Hashes
Filename, size lhc_python-2.0.3-py3.6.egg (271.9 kB) File type Egg Python version 3.6 Upload date Hashes View hashes
Filename, size lhc-python-2.0.3.tar.gz (67.5 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page