A package to calculate and visualise approximate cluster identities for a large number of short nucleotide sequences using minimisers.

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.6
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

Approximate Cluster Identities (ACI)

A python package to visualise the approximate within and between cluster identities of short sequences as assigned by e.g. mmseqs2, cd-hit or panaroo.

Installation

pip install approximate-cluster-identities

Usage

aci -h

Create visualisations of approximate between and within cluster nucleotide identities for short sequences.

positional arguments:
  input_fasta           Input FASTA file of all sequences.
  input_json            Input JSON file with cluster assignments ({<sequence header>: <cluster assignment>}).

optional arguments:
  -h, --help            show this help message and exit
  --clusterGML CLUSTERGML
                        Output path of GML clustering file to view with Cytoscape or similar.
  --distanceTable DISTANCETABLE
                        Output path of CSV of distances (may take a long time).
  --clusterPlot CLUSTERPLOT
                        Output path of jointplot to visualise between and within cluster identities.
  --kmerSize KMERSIZE   Kmer size (default: 9).
  --windowSize WINDOWSIZE
                        Minimiser window size (default: 20).
  --threshold THRESHOLD
                        Jaccard similarity threshold (default: 0.9).
  --threads THREADS     Threads for sketching and jaccard distance calculations (default: 1).
  --shorter             Assess identity relative to the shorter sequence.

Methods

We calculate sequence identities by pairwise calculation of jaccard distances using minimizers of size --kmerSize where 1 k-mer is sampled from a window of a total of --windowSize k-mers that slides across the input sequences. Increasing --windowSize will decrease the number of minimizers per sequence, decreasing the sensitivity of the identity calculations but increasing the speed of the programme. This tool is designed to give you an idea of how variable a large number of short sequences are within and between clusters to choose an appropriate sequencing clustering tool and its parameters.

Example output

Example cluster plots for data in test/ using --windowSize 1 and --windowSize 100.

Window size = 1

Mean identities

Mode identities

Median identities

Range identities

Range of identities

Window size = 100

Mean identities

Mode identities

Median identities

Range identities

Range of identities

Project details

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.6
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

0.1.6

Apr 3, 2024

0.1.5

Apr 3, 2024

This version

0.1.3

May 8, 2023

0.1.2

May 8, 2023

0.1.1

May 8, 2023

0.1

May 8, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

approximate_cluster_identities-0.1.3.tar.gz (9.3 kB view details)

Uploaded May 8, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

approximate_cluster_identities-0.1.3-py3-none-any.whl (10.4 kB view details)

Uploaded May 8, 2023 Python 3

File details

Details for the file approximate_cluster_identities-0.1.3.tar.gz.

File metadata

Download URL: approximate_cluster_identities-0.1.3.tar.gz
Upload date: May 8, 2023
Size: 9.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for approximate_cluster_identities-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`bfb3d10741f535a99bbf1d1104327a5764d3f902dcc03cbf98cdaec73222f415`
MD5	`0605146f846eb09010506aa7f95d4822`
BLAKE2b-256	`7ad6d575ed0bda027c6ea178d6525bd62cf2115f9b3a7ce8356b3222acda0061`

See more details on using hashes here.

File details

Details for the file approximate_cluster_identities-0.1.3-py3-none-any.whl.

File metadata

Download URL: approximate_cluster_identities-0.1.3-py3-none-any.whl
Upload date: May 8, 2023
Size: 10.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for approximate_cluster_identities-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`058d13d7fa5ad445cf1bf1135974491f7b889d2950d5c3cc2e06e76563ce7ee9`
MD5	`0630e1e9c5cfcc12e4b75195aff8dd0e`
BLAKE2b-256	`3cc90791ed66c6b71e7003dc627fab9e5232f5bda9136160f169d2d73005e74e`

See more details on using hashes here.

approximate-cluster-identities 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Approximate Cluster Identities (ACI)

Installation

Usage

Methods

Example output

Window size = 1

Mean identities

Mode identities

Median identities

Range identities

Window size = 100

Mean identities

Mode identities

Median identities

Range identities

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes