Skip to main content

A computational workflow designed to recover plastid genomes from metagenomes.

Project description

testing badge docs badge

docs/source/_static/images/new_ChloroScan_workflow.drawio.png

This workflow is designed to recover chloroplast genomes from metagenomic datasets.

Installation

Before downloading, ensure you have effective mamba and conda working in your server, we recommend the version to be mamba 1.4.2 and conda 23.3.1. Instructions of download is documented here: https://github.com/conda-forge/miniforge.

To install the workflow, use pip3. The background environment will require Python <3.12, >=3.9 to set up the virtual environment. We recommend Python 3.10.

pip3 install chloroscan

This will install the latest version of ChloroScan. Detailed workflow instructions can be found at: https://andyargueasae.github.io/chloroscan/index.html. The website also contains Chinese version of the documentation with identical contents.

Machine/OS Requirements

ChloroScan is only tested on Linux (x86_64), running on IOS system is not recommended. ChloroScan can be installed on servers with hpc clusters and it is recommended to use a GPU to accelerate its running.

Note: Through testing, current version of chloroscan cannot support NVIDIA H-100 GPU, due to cuda version incompatibilities. We will work on updating it to allow better performances.

Configuration databases

Before running ChloroScan, some packages and datasets need to be installed to run CAT taxonomy prediction properly. ChloroScan incorporates a marker gene database while running binning, you don’t need to do anything, it will be loaded since you build conda environments. To download our curated Uniref90-algae plastid protein database, use the link: https://doi.org/10.26188/27990278.

To avoid authentication issues, we recommend using the pyfigshare command-line tool to download. The information of this tool can be found at: https://pypi.org/project/pyfigshare/. * Python > 3.0 is required to download pyfigshare.

Before downloading the files, set up your own figshare account and add an api token to the file ~/.figshare/token in your server.

Then run:

figshare download -o CAT_db.tar.gz 27990278

Note: The tar.gz format of CAT database’s size is 47GB, and nearly 85GB after unzipped, please ensure you have enough disk storage. Meanwhile, the space to setup the conda environment also requires 15 GB of disk.

Sample data to try

To try ChloroScan, I recommend downloading our synthetic metagenome data via the command:

figshare download -o simulated_metagenomes.tar.gz 28748540

There are also some real metagenome datasets (modified to keep them lightweight) available at: https://figshare.unimelb.edu.au/articles/dataset/ChloroScan_test_data/30218614.

To download:

figshare download -o real_test_samples.tar.gz 30218614

Credit

ChloroScan is developed by:

With Yuhao Tong the primary developer, if you want to contact us, please email to:

yuhtong@student.unimelb.edu.au

下载

下载之前,确保您的服务器已经下载好mamba和conda。我们推荐的版本是 mamba 1.4.2 和 conda 23.3.1。关于如何下载请参见:https://github.com/conda-forge/miniforge。

要安装工作流,请使用 pip3。背景环境需要 Python <3.12, >=3.9 来设置虚拟环境。我们推荐python 3.10。

pip3 install chloroscan==0.1.5

详细的工作流说明可以在以下链接找到:https://andyargueasae.github.io/chloroscan/index.html。 该网站还包含中文版本的文档,内容完全相同。

机器/操作系统要求

ChloroScan 仅在 Linux (x86_64) 上测试,建议不要在 IOS 系统上运行。 ChloroScan 可以安装在具有 hpc 集群的服务器上,建议使用 GPU 来加速运行。

注意:通过测试,当前版本的 chloroscan 无法支持 NVIDIA H-100 GPU,因为 cuda 版本不兼容。我们将努力更新它以允许更好的性能。

配置数据库

在运行 ChloroScan 之前,需要安装一些软件包和数据集以正确运行。

ChloroScan 在运行分箱时包含一个标记基因数据库,您无需做任何事情,它将在您构建 conda 环境时加载。 要下载我们经过整理的 Uniref90-algae plastid 蛋白质数据库,请使用以下链接:https://doi.org/10.26188/27990278

为了避免认证问题,我们建议使用 pyfigshare 命令行工具进行下载。有关此工具的信息可以在以下链接找到:https://pypi.org/project/pyfigshare/。 * Python > 3.0 是下载 pyfigshare 的必要条件。

在下载文件之前,设置您自己的 figshare 账户,并将 api 密匙添加到你服务器的文件 ~/.figshare/token 中。 然后运行:

figshare download -o CAT_db.tar.gz 27990278

注意:CAT 数据库的 tar.gz 格式大小为 47GB,解压后约为 85GB,请确保您有足够的磁盘存储空间。同时,设置 conda 环境的空间也需要 15 GB 的磁盘。

试用样本数据

要试用 ChloroScan,我建议通过以下命令下载我们的合成宏基因组数据:

figshare download -o simulated_metagenomes.tar.gz 28748540

还有一些真实的宏基因组数据集(经过修改以保持其不占用太多磁盘空间,但原始数据信息并未抹除)可在以下链接找到:https://figshare.unimelb.edu.au/articles/dataset/ChloroScan_test_data/30218614

用户可通过如下方式下载:

figshare download -o real_test_samples.tar.gz 30218614

联系方式

请联系我们:

yuhtong@student.unimelb.edu.au

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chloroscan-0.2.3.tar.gz (100.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chloroscan-0.2.3-py3-none-any.whl (128.5 kB view details)

Uploaded Python 3

File details

Details for the file chloroscan-0.2.3.tar.gz.

File metadata

  • Download URL: chloroscan-0.2.3.tar.gz
  • Upload date:
  • Size: 100.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.4 CPython/3.12.3 Linux/6.17.0-1010-azure

File hashes

Hashes for chloroscan-0.2.3.tar.gz
Algorithm Hash digest
SHA256 48761bcb1fae35c54acd2253f25bb9fd1e6c55942f2ee5ce2d9ac16e2a6eddb6
MD5 6f526779c35f1f4c940fe8926ff56747
BLAKE2b-256 7f05e2117949181c66781574d3d8d2c50745246387256a01c548408b43190dcf

See more details on using hashes here.

File details

Details for the file chloroscan-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: chloroscan-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 128.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.4 CPython/3.12.3 Linux/6.17.0-1010-azure

File hashes

Hashes for chloroscan-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 5ef83f3d4cbf3309bbc0ebc6c37f538be072d65efd1fb423f12d9aa345ba14fd
MD5 dfd41648b1c91b1373417f72f0e9068d
BLAKE2b-256 d5cf3768c31c49963d76d9ad9fcae990b6edce03463f3025165531d9064bdeee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page