Skip to main content

Quality control pipeline for antibody libraries

Project description

Table of contents

Introduction

AbSeq

AbSeq is a comprehensive bioinformatic pipeline for the analysis of sequencing datasets generated from antibody libraries and abseqPy is one of its packages. Given FASTQ or FASTA files (paired or single-ended), abseqPy generates clonotypes tables, V-(D)-J germline annotations, functional rates, and diversity estimates in a combination of csv and HDF files. More specialized analyses for antibody libraries like primer specificity, sequence motif analysis, and restriction sites analysis are also on the list.

This program is intended to be used in conjunction with abseqR, a reporting and statistical analysis package for the data generated by abseqPy. Although abseqPy works fine without abseqR, it is highly recommended that users also install the R package in order to take the advantage of the interactive HTML reporting capabilities of the pipeline. abseqR's project page shows a few examples of the type of analysis AbSeq provides; the full documentation can be found in abseqR's vignette.

Developers

  • AbSeq is developed by Monther Alhamdoosh and JiaHong Fong
  • For comments and suggestions, email m.hamdoosh <at> gmail <dot> com

Prerequisites

abseqPy depends on a few external software to work and they should be properly installed and configured before running abseqPy.

abseqPy runs on Python 2.7. Python 3.6 support is underway.

Seamless installation of dependencies

This is the recommended way of installing abseqPy's external dependencies.

A python script is available here which downloads and installs all the necessary external dependencies.

This script assumes the following is already available:

To install external dependencies into a folder named ~/.local/abseq:

$ mkdir -p ~/.local/abseq
$ python install_dependencies.py ~/.local/abseq

This script does not install abseqPy itself, only its external dependencies.

This script works with Python 2 and 3, and ~/.local/abseq can be replaced with any directory. However:

  • this directory will be there to stay, so choose wisely
  • the installation script will dump more than just binaries in this directory, it will contain databases and internal files

as soon as the installation succeeds, users will be prompted with an onscreen message to update their environment variables to include installed dependencies in ~/.local/abseq.

Manual installation of dependencies

This section is for when one:

  1. finds that the installation script failed
  2. is feeling adventurous

refer to this document for a detailed guide.

abseqPy installation

This section demonstrates how to install abseqPy.

Install from pip

$ pip install abseqPy

Install from source

$ git clone https://github.com/malhamdoosh/abseqPy.git
$ cd abseqPy
$ pip install .
$ abseq --version

The abseq command should now be available on your command line.

installing abseqPy also installs other python packages, consider using a python virtual environment to prevent overriding existing packages. See virtualenv or conda.

Usage

Basic usage

To get up and running, the following command is often sufficient:

$ abseq -f1 <read 1> -f2 <read 2> -o results --threads 4 --task all

-f2 is only required if it is a paired-end sequencing experiment.

Advanced usage

Besides calling abseq with command line options, abseq also supports -y <file> or --yaml <file> that reads parameters defined in file. This enables multiple samples to be analyzed at the same time, each having shared or independent abseq parameters.

The basic YAML syntax of file is key: val where key is an abseq "long"1 option (see abseq --help for all the "long" option names) and val is the value supplied to the "long" option. Additional samples are specified one after another separated by triple dashes ---.

Example

Assuming a file named example.yml has the following content:

# sample one, PCR1
name: PCR1
file1: fastq/PCR1_R1.fastq.gz
file2: fastq/PCR1_R2.fastq.gz
---
# sample two, PCR2
name: PCR2
file1: fastq/PCR2_R1.fastq.gz
file2: fastq/PCR2_R2.fastq.gz
bitscore: 300                 # override the defaults' 350 for this sample only
task: abundance               # override the defaults' "all" for this sample only
detailedComposition: ~        # enables detailedComposition (-dc) for this sample only
---
# more samples can go here
---
# "defaults" is the only special key allowed.
# It is not in abseq's options, but is used here
# to denote default values to be used for ALL samples
# if they're not specified.
defaults:
    task: all
    outdir: results
    threads: 7
    bitscore: 350
    sstart: 1-3

then executing abseq -y example.yml is equivalent to simultaneously running 2 instances of abseq with the parameters in the defaults field applied to both samples. Here's an equivalent:

$ abseq --task all --outdir results --threads 7 --bitscore 350 --sstart 1-3 \
>   --name PCR1 --file1 fastq/PCR1_R1.fastq.gz --file2 fastq/PCR1_R2.fastq.gz
$ abseq --task abundance --outdir results --threads 7 --bitscore 300 --sstart 1-3 \
>   --name PCR2 --file1 fastq/PCR2_R1.fastq.gz --file2 fastq/PCR2_R2.fastq.gz \
>   --detailedComposition 

Using --yaml is recommended because it is self-documenting, reproducible, and simple to run.

Gotchas

  1. In the above example, specifying threads: 7 in the defaults key of example.yml will run each sample with 7 threads, that is, abseqPy will be running with 7 * number of samples total processes.

Help

Invoking abseq -h in the command line will display the options abseqPy uses.

Supported platforms

abseqPy works on most Linux distros, macOS, and Windows.

Some features are disabled when running in Windows due to software incompatibility, they are:

  • Upstream clustering in --task 5utr
  • Sequence logo generation in --task diversity

1 long option names are option names with a double dash prefix, for example, --help is a long option while -h is not

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

abseqPy-0.99.4.tar.gz (301.7 kB view details)

Uploaded Source

Built Distribution

abseqPy-0.99.4-py2.7-macosx-10.6-x86_64.egg (832.4 kB view details)

Uploaded Source

File details

Details for the file abseqPy-0.99.4.tar.gz.

File metadata

  • Download URL: abseqPy-0.99.4.tar.gz
  • Upload date:
  • Size: 301.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/18.5 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/2.7.10

File hashes

Hashes for abseqPy-0.99.4.tar.gz
Algorithm Hash digest
SHA256 b85cc56604f4140073acfa4bdd44d158c4477ccb86a6dad76968f6c73bdb638e
MD5 694f32932decc02c0730f08533fad5fe
BLAKE2b-256 1181ee45c75a7222afaeebdd7dc01f1a1fa952101b87920c7da1b264910ad317

See more details on using hashes here.

File details

Details for the file abseqPy-0.99.4-py2.7-macosx-10.6-x86_64.egg.

File metadata

  • Download URL: abseqPy-0.99.4-py2.7-macosx-10.6-x86_64.egg
  • Upload date:
  • Size: 832.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/18.5 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/2.7.10

File hashes

Hashes for abseqPy-0.99.4-py2.7-macosx-10.6-x86_64.egg
Algorithm Hash digest
SHA256 2520fa822426fd514347a4f046c271bb93ef81727981b5ff6903dc0931f33110
MD5 997b3018f48fccbf2a531917acb9ffed
BLAKE2b-256 31ceda802cb87bd34a475a7e88e939fcfc017dbdd85269fdacda107955553d82

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page