Skip to main content

SPARSE indexes reference genomes in public databases into hierarchical clusters and uses it to predict origins of metagenomic reads.

Project description

Strain Prediction and Analysis using Representative SEquences (SPARSE)

SPARSE indexes >100,000 reference genomes in public databases in to hierarchical clusters and uses it to predict origins of metagenomic reads.

Build Status License: GPL v3 Docs Status


SPARSE runs on Unix and requires Python >= version 2.7

System modules (Ubuntu 16.04) :

  • pip
  • gfortran
  • llvm
  • libncurses5-dev
  • cmake
  • xvfb-run (for malt, optional)

3rd-party software:

  • samtools (>=1.2)
  • mash (>=1.1.1)
  • bowtie2 (>=2.3.2)
  • malt (>=0.4.0) (optional)

See requirements.txt for python module dependencies.

Installation via PIP

pip install meta-sparse

Installation from source codes (Ubuntu)

sudo apt-get update
sudo apt-get install gfortran llvm libncurses5-dev cmake python-pip samtools bowtie2
git clone
cd SPARSE/EM && make
pip install -r requirements.txt 

Updating SPARSE

To update SPARSE, move to installation directory and pull the latest version:

git pull

Quick Start

See for full documentation.

  1. Download reference database

We provide a pre-compiled database based on RefSeq (dated 14.10.2017) to download at Please download the complete folder refseq_20171014/ and do not change its internal folder structure. The database can be unpacked by running:

cd refseq_20171014 && sh untar.bash

This pre-compiled database contains four default mapping databases, which can be specified in the next step: representative, subpopulation, Virus, Eukaryota.

To update the database or build a costum database, please refer to the full documentation.

  1. Predict read origins

This following command will map and evaluate all reads in both fastq-files against the specified mapping databases.

python predict --dbname refseq_20171014 --mapDB representative,subpopulation,Virus,Eukaryota --r1 read1.fq.gz --r2 read2.fq.gz --workspace <workspace_name>

For single-end reads, only --r1 needs to be specified. All output files are stored in the respective workspace.

  1. Create a report
python report <workspace_name>

The report will be stored in <workspace_name>/profile.txt

  1. Extract reference specific reads

The following command extracts all reads specific to the provided reference ids, which can be found in the output of step 2.

python extract --dbname refseq_20171014 --workspace <workspace_name> --ref_id <comma delimited indices>


SPARSE has not been formally published yet. If you use SPARSE please cite the preprint

Zhemin Zhou, Nina Luhmann, Nabil-Fareed Alikhan, Christopher Quince, Mark Achtman, 'Accurate Reconstruction of Microbial Strains Using Representative Reference Genomes' bioRxiv 215707; doi:

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for meta-sparse, version 0.1.5
Filename, size File type Python version Upload date Hashes
Filename, size meta-sparse-0.1.5.tar.gz (27.5 MB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page