Skip to main content

VDJ assignment and antibody sequence annotation. Scalable from a single sequence to billions of sequences.

Project description

[![Build Status](](

# abstar

VDJ assignment and antibody sequence annotation. Scalable from a single sequence to billions of sequences.

### install pip install abstar

### use

To run abstar on a single FASTA or FASTQ file: abstar -i <input-file> -o <output-directory> -t <temp-directory>

To iteratively run abstar on all files in an input directory: abstar -i <input-directory> -o <output-directory> -t <temp-directory>

To run abstar using the included test data as input: abstar -o <output-directory> -t <temp-directory> –use-test-data

When using the abstar test data, note that although the test data file contains 1,000 sequences, one of the test sequences is not a valid antibody recombination. Only 999 sequences should be processed successfully.

When using BaseSpace as the input data source, you can optionally provide all of the required directories: abstar -i <input-directory> -o <output-directory> -t <temp-directory> -b

Or you can simply provide a single project directory, and all required directories will be created in the project directory: abstar -p <project_directory> -b

### additional options -l, –log Change the log directory location. Default is the parent directory of <output_directory>.

-m, –merge Input directory should contain paired FASTQ (or gzipped FASTQ) files. Paired files will be merged with PANDAseq prior to processing with AbAnalysis.

-b, –basespace Download a sequencing run from BaseSpace, which is Illumina’s cloud storage environment. Since Illumina sequencers produce paired-end reads, –merge is implied.

-u N, –uaid N Sequences contain a unique antibody ID (UAID, or molecular barcode) of length N. The uaid will be parsed from the beginning of each input sequence and added to the JSON output. Negative values result in the UAID being parsed from the end of the sequence.

-s, –species Select the species from which the input sequences are derived. Supported options are ‘human’, ‘mouse’, and ‘macaque’. Default is ‘human’.

-c, –cluster Runs abstar in distributed mode on a Celery cluster.

-h, –help Prints detailed information about all runtime options.

-D –debug Much more verbose logging.

### helper scripts A few helper scripts are included with abstar: batch_mongoimport automates the import of multiple JSON output files into a MongoDB database. build_abstar_germline_db creates abstar germline databases from IMGT-gapped FASTA files of V, D and J gene segments. make_basespace_credfile makes a credentials file for BaseSpace, which is required if downloading sequences from BaseSpace with abstar. Developer credentials are required, and the process for obtaining them is explained [here](

### requirements Python 2.7, 3.5+ abutils biopython celery mock (Python 2.7 only) nwalign3 (nwalign for Python 2.7) pymongo pytest scikit-bio (<=0.4.2 for Python 2.7)

All of the above dependencies can be installed with pip, and will be installed automatically when installing abstar with pip. If you’re new to Python, a great way to get started is to install the [Anaconda Python distribution](, which includes pip as well as a ton of useful scientific Python packages.

sequence merging requires [PANDAseq]( batch_mongoimport requires [MongoDB]( BaseSpace downloading requires the [BaseSpace Python SDK](

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
abstar-0.3.4.tar.gz (38.4 MB) Copy SHA256 hash SHA256 Source None Jul 26, 2018

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page