Quality control pipeline for antibody libraries
Project description
Table of contents
Introduction
AbSeq
AbSeq
is a comprehensive bioinformatic pipeline for the analysis of sequencing datasets generated from antibody libraries and abseqPy
is one of its packages. Given FASTQ or FASTA files (paired or single-ended), abseqPy
generates clonotypes tables, V-(D)-J germline annotations, functional rates, and
diversity estimates in a combination of csv and HDF files. More specialized analyses for antibody libraries
like primer specificity, sequence motif analysis, and restriction sites analysis are also on the list.
This program is intended to be used in conjunction with abseqR
,
a reporting and statistical analysis package for the data generated by abseqPy
. Although abseqPy
works fine without abseqR
, it is highly recommended that users also install the R package in order to take the advantage of the interactive HTML reporting capabilities of the pipeline. abseqR
's project page shows a few examples of the type of analysis AbSeq
provides; the full documentation can be found in abseqR
's vignette.
Developers
AbSeq
is developed by Monther Alhamdoosh and JiaHong Fong- For comments and suggestions, email m.hamdoosh <at> gmail <dot> com
Prerequisites
abseqPy
depends on a few external software to work and they should be properly
installed and configured before running abseqPy
.
abseqPy
runs on Python 2.7. Python 3.6 support is underway.
Seamless installation of dependencies
This is the recommended way of installing abseqPy's external dependencies.
A python script is available here which downloads and installs all the necessary external dependencies.
This script assumes the following is already available:
- perl
- git
- python
- Java JRE version 1.6 or higher
- C/C++ compilers (not required for Windows)
- make (not required for Windows)
- CMake (not required for Windows)
To install external dependencies into a folder named ~/.local/abseq
:
$ mkdir -p ~/.local/abseq
$ python install_dependencies.py ~/.local/abseq
This script does not install
abseqPy
itself, only its external dependencies.
This script works with Python 2 and 3, and ~/.local/abseq
can be replaced with any directory.
However:
- this directory will be there to stay, so choose wisely
- the installation script will dump more than just binaries in this directory, it will contain databases and internal files
as soon as the installation succeeds, users will be prompted with an onscreen message
to update their environment variables to include installed dependencies in ~/.local/abseq
.
Manual installation of dependencies
This section is for when one:
- finds that the installation script failed
- is feeling adventurous
refer to this document for a detailed guide.
abseqPy installation
This section demonstrates how to install abseqPy
.
Install from pip
$ pip install abseqPy
Install from source
$ git clone https://github.com/malhamdoosh/abseqPy.git
$ cd abseqPy
$ pip install .
$ abseq --version
The abseq
command should now be available on your command line.
installing
abseqPy
also installs other python packages, consider using a python virtual environment to prevent overriding existing packages. See virtualenv or conda.
Usage
Basic usage
To get up and running, the following command is often sufficient:
$ abseq -f1 <read 1> -f2 <read 2> -o results --threads 4 --task all
-f2
is only required if it is a paired-end sequencing experiment.
Advanced usage
Besides calling abseq
with command line options, abseq
also supports -y <file>
or --yaml <file>
that reads parameters defined in file
. This enables multiple samples to be analyzed at the same time, each
having shared or independent abseq
parameters.
The basic YAML syntax of file
is key: val
where key
is an abseq
"long"1 option (see abseq --help
for all the "long" option names) and
val
is the value supplied to the "long" option. Additional samples are specified one after another
separated by triple dashes ---
.
Example
Assuming a file named example.yml
has the following content:
# sample one, PCR1
name: PCR1
file1: fastq/PCR1_R1.fastq.gz
file2: fastq/PCR1_R2.fastq.gz
---
# sample two, PCR2
name: PCR2
file1: fastq/PCR2_R1.fastq.gz
file2: fastq/PCR2_R2.fastq.gz
bitscore: 300 # override the defaults' 350 for this sample only
task: abundance # override the defaults' "all" for this sample only
detailedComposition: ~ # enables detailedComposition (-dc) for this sample only
---
# more samples can go here
---
# "defaults" is the only special key allowed.
# It is not in abseq's options, but is used here
# to denote default values to be used for ALL samples
# if they're not specified.
defaults:
task: all
outdir: results
threads: 7
bitscore: 350
sstart: 1-3
then executing abseq -y example.yml
is equivalent to simultaneously running 2 instances of
abseq
with the parameters in the defaults
field applied to both samples. Here's an
equivalent:
$ abseq --task all --outdir results --threads 7 --bitscore 350 --sstart 1-3 \
> --name PCR1 --file1 fastq/PCR1_R1.fastq.gz --file2 fastq/PCR1_R2.fastq.gz
$ abseq --task abundance --outdir results --threads 7 --bitscore 300 --sstart 1-3 \
> --name PCR2 --file1 fastq/PCR2_R1.fastq.gz --file2 fastq/PCR2_R2.fastq.gz \
> --detailedComposition
Using --yaml
is recommended because it is self-documenting, reproducible, and simple to run.
Gotchas
- In the above example, specifying
threads: 7
in thedefaults
key ofexample.yml
will run each sample with 7 threads, that is,abseqPy
will be running with 7 *number of samples
total processes.
Help
Invoking abseq -h
in the command line will display the options abseqPy
uses.
Supported platforms
abseqPy
works on most Linux distros, macOS, and Windows.
Some features are disabled when running in Windows due to software incompatibility, they are:
- Upstream clustering in
--task 5utr
- Sequence logo generation in
--task diversity
1 long option names are option names with a double dash prefix, for example,
--help
is a long option while -h
is not ↩
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file abseqPy-0.99.4.tar.gz
.
File metadata
- Download URL: abseqPy-0.99.4.tar.gz
- Upload date:
- Size: 301.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/18.5 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/2.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b85cc56604f4140073acfa4bdd44d158c4477ccb86a6dad76968f6c73bdb638e |
|
MD5 | 694f32932decc02c0730f08533fad5fe |
|
BLAKE2b-256 | 1181ee45c75a7222afaeebdd7dc01f1a1fa952101b87920c7da1b264910ad317 |
File details
Details for the file abseqPy-0.99.4-py2.7-macosx-10.6-x86_64.egg
.
File metadata
- Download URL: abseqPy-0.99.4-py2.7-macosx-10.6-x86_64.egg
- Upload date:
- Size: 832.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/18.5 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/2.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2520fa822426fd514347a4f046c271bb93ef81727981b5ff6903dc0931f33110 |
|
MD5 | 997b3018f48fccbf2a531917acb9ffed |
|
BLAKE2b-256 | 31ceda802cb87bd34a475a7e88e939fcfc017dbdd85269fdacda107955553d82 |