Skip to main content

Giraffe_View is specially designed to provide a comprehensive assessment of the accuracy of long-read sequencing datasets obtained from both the PacBio and Nanopore platforms.

Project description

Giraffe

PyPI License

Giraffe is specially designed to provide a comprehensive assessment of the accuracy of long-read sequencing datasets obtained from both the Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) platforms, offering four distinct functions.

estimate Calculation of estimated read accuracy (Q score), length, and GC content.

observe Calculation of observed read accuracy, mismatch proportion, and homopolymer identification (e.g. AAAA).

gcbias Calculation of the relationship between GC content and sequencing depth.

modbin Calculation of the distribution of modification (e.g. 5mC or 6mA methylation) at the regional level.

Installation

Installation by Conda

# install on the current environment
conda install -c raymond_liu giraffe_view -y

# install on a new environment
conda create -n giraffe -c raymond_liu giraffe_view -y

Installation by PyPI

Before using this tool, you need to install additional dependencies for read processing, including the samtoolsminimap2, and bedtools. The following commands can help you install both the software package and its dependencies.

# Testing version
# samtools 1.17
# minimap2 2.17-r941
# bedtools 2.30.0

# install on the currently environment
conda install -c bioconda -c conda-forge samtools minimap2 bedtools -y

# install on a new environment
conda create -n giraffe -c bioconda -c conda-forge python==3.9 samtools==1.17 minimap2==2.17 bedtools==2.30.0 -y && conda activate giraffe

To install this tool, please use the following command.

pip install Giraffe-View

Quick usage

Giraffe can be run with a one-button command or by executing individual functions.

ONE-button pattern

# Running function of "estimate", "observe", and "gcbias" with FASTQ files
giraffe --read <read table> --ref <reference> --cpu <number of processes or threads>

# Running function of "estimate", "observe", and "gcbias" with unaligned SAM/BAM files
giraffe --read <unaligned SAM/BAM table> --ref <reference> --cpu <number of processes or threads>

# Example for input table (sample_ID data_type file_path)
sample_A ONT /home/user/data/S1.fastq
sample_B ONT /home/user/data/S2.fastq
sample_C ONT /home/user/data/S3.fastq
...

Here the data_type can be ONT DNA reads (ONT), ONT directly sequencing reads (ONT_RNA), and Pacbio DNA reads (Pacbio).

Estimate function

# For the FASTQ reads
giraffe estimate --read <read table> 

# For the unaligned SAM/BAM files
giraffe estimate --unaligned <unaligned SAM/BAM table>

Observe function

# For FASTQ reads
giraffe observe --read <read table> --ref <reference>

# For unaligned SAM/BAM files
giraffe observe --unaligned <unaligned SAM/BAM table> --ref <reference>

# For aligned SAM/BAM files
giraffe observe --aligned <aligned SAM/BAM table>

Note: If you are going to use aligned SAM/BAM files as input, please remove the secondary alignment (--secondary=no) and add the MD tag (--MD) before mapping by adding these two highlighted parameters.

GCbias function

giraffe gcbias --ref <reference> --aligned <aligned SAM/BAM table>

Modbin function

giraffe modbin --methyl <methylation table> --region <target region>

# Example for methylation file (Chrom Start End Value):
contig_A 132 133 0.92
contig_A 255 256 0.27
contig_A 954 955 0.52
...

Example

Here, we provide demo datasets for testing the Giraffe. The following commands can help to download them and run the demo.

giraffe_run_demo

The demo datasets included three E. coli datasets including a 4.2 MB reference, 79 MB R10.4.1 reads, and 121 MB R9.4.1 reads. For the methylation files, two files of zebrafish blood (23 MB)and kidney (19 KB) are included. This demo takes about 7 minutes and 20 seconds with a maximum memory of 391 MB. This running includes the one-command pattern and four individual functions testing.

Tool showcase

The one-command pattern will generate a summary in HTML format. If the scale of the X/Y-axis is not reasonable, the script of giraffe_plot can be used to replot the figure.

Documentation

For more details about the usage of Giraffe and results profiling, please refer to the document.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

giraffe_view-0.2.3.tar.gz (23.9 kB view details)

Uploaded Source

Built Distribution

Giraffe_View-0.2.3-py3-none-any.whl (29.0 kB view details)

Uploaded Python 3

File details

Details for the file giraffe_view-0.2.3.tar.gz.

File metadata

  • Download URL: giraffe_view-0.2.3.tar.gz
  • Upload date:
  • Size: 23.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for giraffe_view-0.2.3.tar.gz
Algorithm Hash digest
SHA256 015090a4b1e889658c206e4f27623e4a91b3601bcb0e88fe817d80d7e2bf252c
MD5 41819ffad67f788558f9292169519ca2
BLAKE2b-256 1c6e042ed663e613e2bb037dad1a0d7d7b2613986f4574db7e8163ae3d89bd1a

See more details on using hashes here.

File details

Details for the file Giraffe_View-0.2.3-py3-none-any.whl.

File metadata

File hashes

Hashes for Giraffe_View-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f28a9cfad79e06a9ed0ca7c7d20f4fe284cb974c8adc1673255d79d3ef8bba81
MD5 ed86ef8aa1abcf5d38f97f005a94420f
BLAKE2b-256 1bc894ba7724f2ecfb295a4a3dd42fe3d2bea4c66e3830471201393ef62e569c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page