Skip to main content

Create BIOM-format tables from Kraken output.

Project description

Create BIOM-format tables (http://biom-format.org) from Kraken output (http://ccb.jhu.edu/software/kraken/).

Installation

From PyPI:

$ pip install kraken-biom

From GitHub:

$ pip install git+http://github.com/smdabdoub/kraken-biom.git

From source:

$ python setup.py install

Requirements

  • biom-format >= 2.1.5

Documentation

The program takes as input, one or more files output from the kraken-report tool. Each file is parsed and the counts for each OTU (operational taxonomic unit) are recorded, along with database ID (e.g. NCBI), and lineage. The extracted data are then stored in a BIOM table where each count is linked to the Sample and OTU it belongs to. Sample IDs are extracted from the input filenames (everything up to the ‘.’).

OTUs are defined by the –max and –min arguments. By default these are set to Order and Species respectively. This means that counts assigned directly to an Order, Family, or Genus are recorded under the associated OTU ID, and counts assigned at or below the Species level are assigned to the OTU ID for the species. Setting a minimum rank below Species is not yet available.

The BIOM format currently has two major versions. Version 1.0 uses the JSON (JavaScript Object Notation) format as a base. Version 2.x uses the HDF5 (Hierarchical Data Format v5) as a base. The output format can be specified with the –fmt option. Note that a tab-separated (tsv) output format is also available. The resulting file will not contain most of the metadata, but can be opened by spreadsheet programs.

Version 2 of the BIOM format is used by default for output, but requires the Python library ‘h5py’. If the library is not installed, kraken-biom will automatically switch to using version 1.0. Note that the output can optionally be compressed with gzip (–gzip) for version 1.0 and TSV files. Version 2 files are automatically compressed.

Currently the taxonomy for each OTU ID is stored as row metadata in the BIOM table using the standard seven-level QIIME format: k__K; p__P; … s__S. If you would like another format supported, please file an issue or send a pull request (note the contribution guidelines).

usage: kraken-biom.py [-h] [--max {D,P,C,O,F,G,S}] [--min {D,P,C,O,F,G,S}]
                      [-o OUTPUT_FP] [--fmt {hdf5,json,tsv}] [--gzip]
                      [--version] [-v]
                      kraken_reports [kraken_reports ...]

Usage examples

  1. Basic usage with default parameters:

    $ kraken-biom.py S1.txt S2.txt

This produces a compressed BIOM 2.1 file: table.biom

  1. BIOM v1.0 output:

    $ kraken-biom.py S1.txt S2.txt --fmt json

Produces a BIOM 1.0 file: table.biom

  1. Compressed TSV output:

    $ kraken-biom.py S1.txt S2.txt --fmt tsv --gzip -o table.tsv

Produces a TSV file: table.tsv.gz

  1. Change the max and min OTU levels to Class and Genus:

    $ kraken-biom.py S1.txt S2.txt --max C --min G

Program arguments

positional arguments:

kraken_reports        Results files from the kraken-report tool.

optional arguments:

-h, --help            show this help message and exit
--max {D,P,C,O,F,G,S}
                      Assigned reads will be recorded only if they are at or
                      below max rank. Default: O.
--min {D,P,C,O,F,G,S}
                      Reads assigned at and below min rank will be recorded
                      as being assigned to the min rank level. Default: S.
-o OUTPUT_FP, --output_fp OUTPUT_FP
                      Path to the BIOM-format file. By default, the table
                      will be in the HDF5 BIOM 2.x format. Users can output
                      to a different format using the --fmt option. The
                      output can also be gzipped using the --gzip option.
                      Default path is: ./table.biom
--fmt {hdf5,json,tsv}
                      Set the output format of the BIOM table. Default is
                      HDF5.
--gzip                Compress the output BIOM table with gzip. HDF5 BIOM
                      (v2.x) files are internally compressed by default, so
                      this option is not needed when specifying --fmt hdf5.
--version             show program's version number and exit
-v, --verbose         Prints status messages during program execution.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kraken-biom-1.0.1.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

kraken_biom-1.0.1-py2.py3-none-any.whl (12.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file kraken-biom-1.0.1.tar.gz.

File metadata

  • Download URL: kraken-biom-1.0.1.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for kraken-biom-1.0.1.tar.gz
Algorithm Hash digest
SHA256 4d2ebba65a2b418fc0f6af4ebe4c8a7cb52194b0a8d106ea679ac716f448ce7a
MD5 12de68accb1a24f7c42727f6518acb15
BLAKE2b-256 ce540ce6705a673629787f5cb9ab4f08d338921a7dd4ec9ac68e2b195215f639

See more details on using hashes here.

File details

Details for the file kraken_biom-1.0.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for kraken_biom-1.0.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 ccf4c29fdd0f86d06425b9111a2acec7cf5d3bb858ab0fa9a45307a31cf6c258
MD5 bf6f3c75c69bbe39a061a51915658ad1
BLAKE2b-256 10e4c436090779a745a873753d573d71472e24cf6e55d4f57a9c60bd6f397216

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page