Skip to main content

Some tools I find useful for working with Ig receptor sequences

Project description

receptor_utils

Some tools I find useful for working with Ig receptor sequences.

Installation

pip install receptor-utils

The module requires Biopython.

Overview

Please refer to the files themselves for slightly more detailed documentation.

simple_bio_seq

Contains some convenience functions that are backed by BioPython but simplified for my use case. It uses the following approach to keep things simple (at the expense of some flexibility/scalability):

  • store sequences as strings, use dicts for collections
  • convert sequences to upper case on input
  • coerce iterators into lists for ease of debugging
from receptor_utils import simple_bio_seq as simple
seqs = simple.read_fasta('seqfile.fasta')  # read sequences into a dict with names as keys
seq = simple.read_single_fasta('seqfile.fasta')  # reads the first or only sequence into a string
seq = simple.reverse_complement(seq)

See the file for other functions.

novel_allele_name

Contains the function name_novel(), which will generate a name for a 'previously undocumented' allele, given its sequence. The name will consist of the name of the nearest allele in a reference set provided to the function, suffixed by the SNPs that differentiate it, for example:

IGHV1-69*01_a29g_c113t

Numbering of V-sequences uses the IMGT alignment. The naming convention follows that used by Tigger and VDJbase.

number_ighv

Contains various functions for working with V-sequences according to the IMGT numbering scheme. The most useful is gap_sequence() which will gap the provided V-sequence by using the closest sequence in a reference set as a template.

Example scripts

These may be useful in their own right, but also show how to use some of the functions mentioned above. Once the package is installed, you should be able to run these at the command line without the .py extension, for example type

$ extract_refs --help

for help

rev_comp

Return the reverse-complement of the specified nucleotide sequence.

name_allele

Return a 'tigger style' name for an allele sequence (reference allele name suffixed with SNPs) given a reference set

extract_refs

A script which uses simple_bio_seq to extract files for particular loci and species from an IMGT reference file.

identical_seqs

A script which uses simple_bio_seq to list identical sequences and sub-sequences in a fasta file.

gap_inferred

A script which will gap a set of sequences listed in a FASTA file, using the closest sequences discovered from a reference set. The script will do its best to warn of issues with the reference sequences and with the gapped sequences it provides: please use the warnings to check that things are ok.

Sequences to be gapped are assumed to be complete at the 5' end. If necessary they should be gapped with dots at the 5' end, so that the first nucleotide is at the correct position in the full-length sequence (in exactly the same way that IMGT puts dots at the start of a reference sequence that is incomplete at the 5' end)

make_igblast_ndm

A script which uses a set of IMGT-gapped V-sequences to create the ndm file required by IgBLAST for a custom organism

annotate_j

Given a set of J sequences, identify the correct frame and location of the CDR3 end, by searching for the GxG motif.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

receptor_utils-0.0.20.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

receptor_utils-0.0.20-py3-none-any.whl (18.9 kB view details)

Uploaded Python 3

File details

Details for the file receptor_utils-0.0.20.tar.gz.

File metadata

  • Download URL: receptor_utils-0.0.20.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for receptor_utils-0.0.20.tar.gz
Algorithm Hash digest
SHA256 2dd0c7fa6f8baeac7d82ac56de6aefe7e3045eb91ea624f8c6ec46692d7e4871
MD5 4174ab30c07c819942a3d90f32c0e5ee
BLAKE2b-256 876f1e112d7624da9f70433d3283d08802d3da2a1c3b36616a5818f63f771c76

See more details on using hashes here.

File details

Details for the file receptor_utils-0.0.20-py3-none-any.whl.

File metadata

  • Download URL: receptor_utils-0.0.20-py3-none-any.whl
  • Upload date:
  • Size: 18.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for receptor_utils-0.0.20-py3-none-any.whl
Algorithm Hash digest
SHA256 d1402ed90821637074df2774f12ae5d2af351af8151b5fd4a248792834934e7e
MD5 14cf70a540d900f42c8e4766a324c75b
BLAKE2b-256 3fb51f2651dcb14a00a19bf51b7f4fc98bb8c879c9e1d8cf37279825a02ca24c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page