Some tools I find useful for working with Ig receptor sequences
Project description
receptor_utils
Some tools I find useful for working with Ig receptor sequences.
Installation
git clone https://github.com/williamdlees/receptor_utils
pip install receptor_utils
The module requires Biopython.
(will be on PyPi soon)
Overview
Please refer to the files themselves for slightly more detailed documentation.
simple_bio_seq
Contains some convenience functions that are backed by BioPython but simplified for my use case. It uses the following approach to keep things simple (at the expense of some flexibility/scalability):
- store sequences as strings, use dicts for collections
- convert sequences to upper case on input
- coerce iterators into lists for ease of debugging
from receptor_utils import simple_bio_seq as simple
seqs = simple.read_fasta('seqfile.fasta') # read sequences into a dict with names as keys
seq = simple.read_single_fasta('seqfile.fasta') # reads the first or only sequence into a string
seq = simple.reverse_complement(seq)
See the file for other functions.
novel_allele_name
Contains the function name_novel()
, which will generate a name for a 'previously undocumented'
allele, given its sequence. The name will consist of the name of the nearest allele in a
reference set provided to the function, suffixed by the SNPs that differentiate it,
for example:
IGHV1-69*01_a29g_c113t
Numbering of V-sequences uses the IMGT alignment. The naming convention follows that used by Tigger and VDJbase.
number_ighv
Contains various functions for working with V-sequences according to the IMGT numbering scheme.
The most useful is gap_sequence()
which will gap the provided V-sequence by using the closest sequence in a reference
set as a template.
Example scripts
These may be useful in their own right, but also show how to use some of the functions mentioned above.
extract_refs.py
A script which uses simple_bio_seq
to extract
files for particular loci and species from an IMGT reference file.
gap_inferred.py
A script which will gap a set of sequences listed in a FASTA file, using the closest sequences discovered from a reference set.
identical_seqs.py
A script which uses simple_bio_seq
to list identical sequences and sub-sequences in a fasta file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file receptor_utils-0.0.1.tar.gz
.
File metadata
- Download URL: receptor_utils-0.0.1.tar.gz
- Upload date:
- Size: 2.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6cf1cdcbccc55eeff79f093b61b76c70aebd9420c468b770f9d709dcce6fe2dc |
|
MD5 | 47da5806c0f483aadc842aa85b4324ef |
|
BLAKE2b-256 | cba51ec455894130ac489ae668783f73a4b59ff6c53b97ce07340a5227329135 |
File details
Details for the file receptor_utils-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: receptor_utils-0.0.1-py3-none-any.whl
- Upload date:
- Size: 3.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 153fb6bed35110d142414968064ae2cc06f027ab8bdfbde38c8598789771c837 |
|
MD5 | b0ca948e9720145bb73eb87e65d3650f |
|
BLAKE2b-256 | 24d4ea9c36fcffba6f46e799daa10ee6a8b24ea72bbfc477b69a34d1ef31fe50 |