Skip to main content

Fetch, process, analyze, and aggregate microbiome sequencing data with SRA Toolkit and QIIME2.

Project description

q2sra

Conventional microbiome bioinformatics workflows are riddled with inefficiencies, as users must navigate a variety of fragmented tools, command-line utilities, and file management systems. In the contemporary research setting, with multiple individuals contribtuting to a singular project, issues with uniformity often arise, complicating subsequent data aggregation/analysis. The q2sra package reconciles these obstacles by providing a streamlined, centralized, and standardized framework for microbiome data analysis with QIIME 2.

Installation

$ pip install q2sra

Prerequisites

Installing QIIME 2 with Conda

$ wget https://data.qiime2.org/distro/core/qiime2-2023.7-py38-linux-conda.yml
$ conda env create \
    -n qiime2-2023.7 \
    --file qiime2-2023.7-py38-linux-conda.yml
$ rm qiime2-2023.7-py38-linux-conda.yml

Installing SRA Toolkit

Instructions can be found here.

Creating a q2sra Project

To create a project, simply initialize a q2sra.Proj object, supplying the intended project name as the sole parameter.

>>> from q2sra import Proj
>>> proj = Proj('my_proj')

q2sra Project Attributes

Attribute Type Default Description
name String None Project name
fields List of str [ ] Metadata fields
nsamples Integer 30 Maximum number of samples from each study
paired Boolean True Whether to use forward and reverse reads or exclusively forward reads

Adding Metadata Fields

q2sra.Proj.add_field(field: str, required: bool) -> None

Arguments

  • field - Name of field
  • required - Whether the field is required [default=False]

Example Run

>>> proj.add_field('Phylum')
>>> proj.add_field('Country', required = True)

Saving a Project

>>> proj.save()

Output

<proj name>.pkl - a pickle file storing the project's attributes.

Loading a Pre-configured Project

Any existing q2sra project saved in .pkl format (see previous step) can later be loaded to perform additional actions (adding more studies, merging runs, etc.).

q2sra.proj.load(name: str) -> None

Arguments

  • name - Name of project to load

Example Run

>>> proj = Proj.load('my_proj')

Adding Studies

q2sra.Proj.run(study_name: str, accession: str, include: list, exclude: list) -> None

Arguments

  • study_name - Name of study
  • accession - Study accession number in the NCBI SRA database
  • include - List of substrings that must be included when filtering .fastq files [default=[]]
  • exclude - List of substrings that must be excluded when filtering .fastq files [default=[]]

Example Run (w/ user input)

>>> proj.run('takagi_2022', 'PRJNA809527')
Phylum: Chordata
[Required] Country: Japan

Aggregating Studies

After compiling a satisfactory number of studies, individual metadata files and QIIME 2 feature tables/representative sequences can be merged for further analysis using the following method:

>>> proj.merge()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

q2sra-1.0.0.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

q2sra-1.0.0-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file q2sra-1.0.0.tar.gz.

File metadata

  • Download URL: q2sra-1.0.0.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.15

File hashes

Hashes for q2sra-1.0.0.tar.gz
Algorithm Hash digest
SHA256 34df97b8038187eb2cd8945e075fc2d7edf62c63cf0d284a7e0f0a1d38ebbd91
MD5 aad06a9d5b63a1cd385c67aadd0af27f
BLAKE2b-256 1edd325c14b3980dc38995fac35119ae3d3b5f11b13ed16fbbe9f1ea089c9f30

See more details on using hashes here.

File details

Details for the file q2sra-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: q2sra-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 8.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.15

File hashes

Hashes for q2sra-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f67e701a4f737d31ffee54339eb988965fbbdd34f6a555635e5a1aa2f41aac4a
MD5 3ba2769eb1590627bf21ca0599b09e7f
BLAKE2b-256 8f37864573a58284eddc2aecfff697bee54bc1012fdf82a11737270b9ff2aab0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page