Skip to main content

Algorithm for clustering protein multiple sequence alignments using normalized mutual information.

Project description

PSICalc Algorithm Package

This is a package for clustering Multiple Sequence Alignments (MSAs) utilizing normalized mutual information to examine protein subdomains. A complete data visualization tool for psicalc is available on the releases page.

As an example:

import psicalc as pc

file = "<your_fasta_file>" # e.g "PF02517_seed.txt"

data = pc.read_txt_file_format(file) # read Fasta file

data = pc.durston_schema(data, 1) # Label column index starting at 1

# If you have multiple sequences or labels, merge them first
data = pc.merge_sequences([data], ['HIST'])

result = pc.find_clusters(1, data) # will sample every column against msa

# Optionally write dictionary to csv
pc.write_output_data(1, result)

The program will run and return a csv or xlsx file with the strongest clusters found in the MSA provided.

Our initial publication can be found here: https://academic.oup.com/bioinformaticsadvances/article/2/1/vbac058/6671262

Following our initial publication, the program was found to associate invariant columns with variable columns in some cases. It was determined that the invariant columns were causing an issue, and due to their low entropy, invariant or nearly invariant positions offered little information in the way of meaningful clustering. Therefore, in the latest version (0.5.1 and beyond), we have added the ability to filter out low entropy columns using a sliding scale from 0-0.25 (0-25%) entropy where entropy is the number of different amino acids found in a column along with the number of occurrences of each amino acid. Invariant columns (i.e., those with only one amino acid) have an entropy of 0. A report of the columns removed due to low entropy is included with the output data file. As a result of these changes, data run using this latest version will not match what was found in our initial paper, but should represent clusters based upon meaningful relationships. In all cases, researchers are advised to inspect the outputs to confirm the associations are meaningful.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

psicalc-0.6.3.tar.gz (18.4 kB view details)

Uploaded Source

Built Distribution

psicalc-0.6.3-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file psicalc-0.6.3.tar.gz.

File metadata

  • Download URL: psicalc-0.6.3.tar.gz
  • Upload date:
  • Size: 18.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for psicalc-0.6.3.tar.gz
Algorithm Hash digest
SHA256 0bae59bc4803bea8c0540df68b5cfad120e5b582ea86b980c51493a3ef656b90
MD5 1d3d927fbb42c051dc590c6d8076cc79
BLAKE2b-256 9eec9c505dbb45598f61dd841967d310fce13a4153b14141dde5aa227ff9f932

See more details on using hashes here.

File details

Details for the file psicalc-0.6.3-py3-none-any.whl.

File metadata

  • Download URL: psicalc-0.6.3-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for psicalc-0.6.3-py3-none-any.whl
Algorithm Hash digest
SHA256 aaff17b6da712f1290ef420d058ff3b0f8b1165dc205e454c96235475901d993
MD5 5601d20d49594d183f15a149c276794a
BLAKE2b-256 6f1d6073f78891a1e36777e0dc4cecf4b558c464c9de9df2d18d0d2f2ccffcca

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page