Skip to main content

Iterative Gene clUster Analysis, a high-throughput method for gene cluster family identification.

Project description

🦎 IGUA Stars

Iterative Gene clUster Analysis, a high-throughput method for gene cluster family identification.

Actions Coverage PyPI Bioconda AUR Wheel Python Versions Python Implementations License Source Mirror GitHub issues Docs Changelog Downloads Preprint

🗺️ Overview

IGUA is a method for high-throughput content-agnostic identification of Gene Cluster Families (GCFs) from gene clusters of genomic and metagenomic origin. It performs three clustering iterations to perform GCF assignment:

  • Fragment mapping identification: Reduce the input sequence space by identifying which gene clusters are fragments of each other.
  • Nucleotide deduplication: Find similar gene clusters in genomic space, using linear clustering with lower sequence identity and coverage.
  • Protein representation: Compute a numerical representation of gene clusters in term of protein composition, using representatives from a protein sequence clustering, to identify more distant relatives not captured by the previous step.

Compared to similar methods such as BiG-SLiCE or BiG-SCAPE, IGUA does not use Pfam domains to represent gene cluster composition, using instead representatives from an unsupervised clustering. This allows IGUA to accurately account for proteins that may not be covered by Pfam, and avoids performing a costly annotation step. The resulting protein representatives can be later annotated indepently to transfer annotations to the GCFs.

🔧 Installing

Bioconda

IGUA and all of its dependencies are available via Bioconda and can be installed using e.g., conda or pixi:

  1. First, set up Bioconda with Pixi or Conda.

  2. Then, install IGUA using the appropriate method:

With conda:

$ conda install igua

With pixi:

$ pixi add igua

Apptainer, Docker, and Singularity

IGUA (and all of its dependencies) can be run using e.g., Docker, Apptainer, and Singularity, using images available here.

An example using Apptainer (using IGUA v0.1.0):

apptainer pull docker://quay.io/biocontainers/igua:0.1.0--py39h5b94c0b_0

pip

IGUA can be downloaded directly from PyPI, which hosts pre-compiled distributions for Linux, MacOS and Windows. Simply install with pip:

$ pip install igua

Note that you will need to install MMseqs2 yourself through other means.

💡 Running

📥 Inputs

The gene clusters to pass to IGUA must be in GenBank format, with gene annotations inside of CDS features. Several GenBank files can be passed to the same pipeline run.

$ igua -g clusters1.gbk -g clusters2.gbk ...

The GenBank locus identifier will be used as the name of each gene cluster. This may cause problems with gene clusters obtained with some tools, such as antiSMASH. If the input contains duplicate identifiers, the first gene cluster with a given identifier will be used, and a warning will be displayed.

📤 Outputs

The main output of IGUA is a TSV file which assigns a Gene Cluster Family to each gene cluster found in the input. The GCF identifiers are arbitrary, and the prefix can be changed with the --prefix flag. The table will also record the original file from which each record was obtained to facilitate resource management. The table is written to the filename given with the --output flag.

The sequences of the representative proteins extracted from each cluster can be saved to a FASTA file with the --features flag. These proteins are used for compositional representation of gene clusters, and can be used to transfer annotations to the GCF representatives. The final compositional matrix for each GCF representative, which can be useful for computing distances between GCFs, can be saved as an anndata sparse matrix to a filename given with the --compositions flag.

📝 Workspace

MMseqs needs a fast scratch space to work with intermediate files while running linear clustering. By default, this will use a temporary folder obtained with tempfile.TemporaryDirectory, which typically lies inside /tmp. To use a different folder, use the --workdir flag.

🫧 Clustering

By default, IGUA will use average linkage clustering and a relative distance threshold of 0.8, which corresponds to clusters inside a GCF having at most 20% of estimated difference at the amino-acid level. These two options can be changed with the --clustering-method and --clustering-distance flags.

Additionally, the precision of the distance matrix used for the clustering can be lowered to reduce memory usage, using single or half precision floating point numbers instead of the double precision used by default. Use the --clustering-precision flag to control numerical precision.

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

🏗️ Contributing

Contributions are more than welcome! See CONTRIBUTING.md for more details.

📋 Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

⚖️ License

This library is provided under the GNU General Public License v3.0.

This project was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory and the Leiden University Medical Center in the Zeller team.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

igua-0.2.0.tar.gz (531.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

igua-0.2.0-cp38-abi3-win_amd64.whl (291.7 kB view details)

Uploaded CPython 3.8+Windows x86-64

igua-0.2.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (464.6 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

igua-0.2.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (442.2 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

igua-0.2.0-cp38-abi3-macosx_12_0_x86_64.whl (419.0 kB view details)

Uploaded CPython 3.8+macOS 12.0+ x86-64

igua-0.2.0-cp38-abi3-macosx_11_0_arm64.whl (392.2 kB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

File details

Details for the file igua-0.2.0.tar.gz.

File metadata

  • Download URL: igua-0.2.0.tar.gz
  • Upload date:
  • Size: 531.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for igua-0.2.0.tar.gz
Algorithm Hash digest
SHA256 36195121e30bbd7620c3b7f568d51000dc30bc80eab6d59ff4cbbfc0e34b05fb
MD5 0e08cb0c2911e098150aa50eabecd499
BLAKE2b-256 6361271719e1359f1abc0c8929393767f0d6310450c93ead71b302cfbfd82e63

See more details on using hashes here.

Provenance

The following attestation bundles were made for igua-0.2.0.tar.gz:

Publisher: package.yml on zellerlab/IGUA

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file igua-0.2.0-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: igua-0.2.0-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 291.7 kB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for igua-0.2.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 35c6e649e41ac0474d79b8ad3c225fb79e9a5b4713375db4e20423aaf7697ee5
MD5 f4706a3823ca060c897548f0f5a8b73a
BLAKE2b-256 0fad5a1a39a149adc010aea6e81c7d4168303947e0cdcba36419d1fc3a6a1b63

See more details on using hashes here.

Provenance

The following attestation bundles were made for igua-0.2.0-cp38-abi3-win_amd64.whl:

Publisher: package.yml on zellerlab/IGUA

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file igua-0.2.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for igua-0.2.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c9f335dd187b44bcbebaed4e6720150adb6616da569a0b59f4240c18a71d0a1e
MD5 000af58bd1daf8872b2bb627c9d261e8
BLAKE2b-256 b044325e732f2177edae364b81281b35db3e0680f696e2c0e2409a751af565f5

See more details on using hashes here.

Provenance

The following attestation bundles were made for igua-0.2.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: package.yml on zellerlab/IGUA

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file igua-0.2.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for igua-0.2.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 a233a699ebfecf8509b628a280001c3fb8d75a0fcf3643ef233496284b071f55
MD5 e56c6686441c1db1c2d88b4c606a1c69
BLAKE2b-256 2ba1a015dbf1cec3a2832d99b77f5d4e2c979ec805a67363d0a42561932c0e06

See more details on using hashes here.

Provenance

The following attestation bundles were made for igua-0.2.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: package.yml on zellerlab/IGUA

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file igua-0.2.0-cp38-abi3-macosx_12_0_x86_64.whl.

File metadata

  • Download URL: igua-0.2.0-cp38-abi3-macosx_12_0_x86_64.whl
  • Upload date:
  • Size: 419.0 kB
  • Tags: CPython 3.8+, macOS 12.0+ x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for igua-0.2.0-cp38-abi3-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 dad11fab11a4622d118ad72e75099570d77b5aec827b373049b60d9e18c5e9d8
MD5 5e4ff6dce99f42c23b3cd36aaeea9eac
BLAKE2b-256 5ca341a1e866055dbfa1727eebfbf6c6c5dbee3fd5135199308c96eba84ba32c

See more details on using hashes here.

Provenance

The following attestation bundles were made for igua-0.2.0-cp38-abi3-macosx_12_0_x86_64.whl:

Publisher: package.yml on zellerlab/IGUA

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file igua-0.2.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: igua-0.2.0-cp38-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 392.2 kB
  • Tags: CPython 3.8+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for igua-0.2.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a6194d63f833ab1179e457d24808b9160c3a1ca1df13d3d41fd19374dc67bb57
MD5 0a5ca64c55dc18d4794c5b17facc4b9f
BLAKE2b-256 eb0ed67d6cf604359460a4068b9381e8d85969b6b8e4c44cb483a52e218e69c4

See more details on using hashes here.

Provenance

The following attestation bundles were made for igua-0.2.0-cp38-abi3-macosx_11_0_arm64.whl:

Publisher: package.yml on zellerlab/IGUA

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page