Skip to main content

A simple Protein Amino-Acid Analyzer CLI application

Project description

Protein Amino Acids Analyzer

Overview

The Protein Amino Acids Analyzer is a command-line application that takes a protein sequence in string format as input and returns a JSON object containing various insights about the sequence.

Features

  • Amino Acid Composition: Counts the number of each type of amino acid.
  • Proportions: Calculates the proportion of each amino acid in the sequence.
  • Molecular Weight: Calculates the total molecular weight of the protein.
  • Isoelectric Point (pI): Estimates the isoelectric point of the protein.
  • Hydrophobicity: Provides a measure of the overall hydrophobicity of the sequence.
  • Detailed Residue Information: Provides molecular weight, isoelectric point, and hydrophobicity for each amino acid residue in the sequence.
  • Amino Acid Composition Plot: Generates a plot of the amino acid composition.

Implementation Steps

  1. Input Handling: Accepts a protein sequence as input from the command line.
  2. Amino Acid Composition Calculation: Uses a dictionary to count occurrences of each amino acid.
  3. Proportions Calculation: Calculates the proportion of each amino acid in the sequence.
  4. Molecular Weight Calculation: Sums the weights of individual amino acids using a predefined weight table.
  5. Isoelectric Point Calculation: Uses an algorithm to estimate the pI based on the amino acid composition.
  6. Hydrophobicity Calculation: Sums the hydrophobicity values of individual amino acids using a predefined scale (e.g., Kyte-Doolittle scale).
  7. Residue Information Calculation: Calculates detailed information for each residue, including molecular weight, isoelectric point, and hydrophobicity.
  8. Plot Generation: Generates a bar plot for the amino acid composition.

Usage

Prerequisites

Make sure you have Python 3.x installed on your machine.

Installation

Clone this repository or download the protein_aa_analyze.py file directly.

Install the required dependencies: pip install matplotlib

Execution

To run the script, use the following command: python protein_aa_analyze.py "PROTEIN_SEQUENCE" OUTPUT_FILE.json [--plot OUTPUT_PLOT.png]

  • PROTEIN_SEQUENCE: The protein sequence to analyze (e.g., "ACDEFGHIKLMNPQRSTVWY").
  • OUTPUT_FILE.json: The name of the JSON file where the results will be saved.
  • --plot OUTPUT_PLOT.png: (Optional) The name of the plot file where the amino acid composition will be saved.

Example

protein_aa_analyze.py "ACDEFGHIKLMNPQRSTVWY" output.json --plot composition_plot.png

This will analyze the sequence "ACDEFGHIKLMNPQRSTVWY", save the insights in a file named output.json, and save the amino acid composition plot in composition_plot.png.

Output Format

The JSON output will have the following structure

{
  "sequence": "ACDEFGHIKLMNPQRSTVWY",
  "protein_molecular_weight": {
    "value": 2373.4,
    "unit": "Daltons",
    "reference": "https://www.sigmaaldrich.com/BE/en/technical-documents/technical-article/protein-biology/protein-structural-analysis/amino-acid-reference-chart"
  },
  "protein_isoelectric_point": {
    "value": 6.8,
    "unit": "pH",
    "reference": "https://www.sigmaaldrich.com/BE/en/technical-documents/technical-article/protein-biology/protein-structural-analysis/amino-acid-reference-chart"
  },
  "protein_hydrophobicity": {
    "value": -5.7,
    "unit": "Arbitrary units with The Kyte-Doolittle scale",
    "reference": "https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/650/Hydrophobicity_scales.html#:~:text=The%20Kyte%2DDoolittle%20scale%20is,on%20the%20window%20size%20used."
  },
  "amino_acid_composition": {
    "A": 1, "C": 1, "D": 1, "E": 1, "F": 1, "G": 1, "H": 1, "I": 1, "K": 1, "L": 1,
    "M": 1, "N": 1, "P": 1, "Q": 1, "R": 1, "S": 1, "T": 1, "V": 1, "W": 1, "Y": 1
  },
  "amino_acid_proportions": {
    "A": 0.05, "C": 0.05, "D": 0.05, "E": 0.05, "F": 0.05, "G": 0.05, "H": 0.05, "I": 0.05,
    "K": 0.05, "L": 0.05, "M": 0.05, "N": 0.05, "P": 0.05, "Q": 0.05, "R": 0.05, "S": 0.05,
    "T": 0.05, "V": 0.05, "W": 0.05, "Y": 0.05
  },
  "residues": [
    {
      "position": 1,
      "index": 0,
      "amino_acid": "A",
      "molecular_weight": 89.1,
      "isoelectric_point": 0.345,
      "hydrophobicity": 1.8
    },
    {
      "position": 2,
      "index": 1,
      "amino_acid": "C",
      "molecular_weight": 121.2,
      "isoelectric_point": 0.345,
      "hydrophobicity": 2.5
    },
    ...
  ]
}

Code Explanation

ProteinAnalyzer Class

__init__(self, sequence)

  • Initializes the protein sequence.
  • Defines the amino acids to analyze.
  • Initializes a counter for each amino acid to 0.
  • Initializes a proportions dictionary for each amino acid to 0.

calculate_amino_acid_composition(self)

  • Calculates the number of each type of amino acid in the sequence.

calculate_amino_acid_proportions(self)

  • Calculates the proportion of each type of amino acid in the sequence and rounds it to 3 decimal places.

calculate_molecular_weight(self)

  • Calculates the total molecular weight of the protein based on a predefined weight table and rounds it to 2 decimal places.

calculate_isoelectric_point(self)

  • Estimates the isoelectric point (pI) of the protein using an algorithm based on the amino acid composition.

calculate_hydrophobicity(self)

  • Provides a measure of the overall hydrophobicity of the sequence using a predefined scale (e.g., Kyte-Doolittle scale) and rounds it to 2 decimal places.

calculate_residues_info(self)

  • Calculates detailed information for each residue, including molecular weight, isoelectric point, and hydrophobicity.

plot_amino_acid_composition(self, output_file)

  • Generates a bar plot for the amino acid composition.

main() Function

  1. ArgumentParser: Uses argparse to handle command-line arguments.

    • sequence: The protein sequence to analyze.
    • output: The name of the JSON output file.
    • --plot: (Optional) The name of the plot file where the amino acid composition will be saved.
  2. Sequence Analysis: Displays the received sequence and output file. Instantiates the ProteinAnalyzer class and calculates the insights. Displays the calculated insights.

  3. Saving Results: Saves the insights in a JSON file. Displays a confirmation or error message. If the --plot argument is provided, generates and saves the plot.

Example Output

If you run the script with the sequence "ACDEFGHIKLMNPQRSTVWY" and the file name output.json, the contents of the output.json file might look like this:

{
  "sequence": "ACDEFGHIKLMNPQRSTVWY",
  "protein_molecular_weight": {
    "value": 2373.4,
    "unit": "Daltons",
    "reference": "https://www.sigmaaldrich.com/BE/en/technical-documents/technical-article/protein-biology/protein-structural-analysis/amino-acid-reference-chart"
  },
  "protein_isoelectric_point": {
    "value": 6.8,
    "unit": "pH",
    "reference": "https://www.sigmaaldrich.com/BE/en/technical-documents/technical-article/protein-biology/protein-structural-analysis/amino-acid-reference-chart"
  },
  "protein_hydrophobicity": {
    "value": -5.7,
    "unit": "Arbitrary units with The Kyte-Doolittle scale",
    "reference": "https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/650/Hydrophobicity_scales.html#:~:text=The%20Kyte%2DDoolittle%20scale%20is,on%20the%20window%20size%20used."
  },
  "amino_acid_composition": {
    "A": 1, "C": 1, "D": 1, "E": 1, "F": 1, "G": 1, "H": 1, "I": 1, "K": 1, "L": 1,
    "M": 1, "N": 1, "P": 1, "Q": 1, "R": 1, "S": 1, "T": 1, "V": 1, "W": 1, "Y": 1
  },
  "amino_acid_proportions": {
    "A": 0.05, "C": 0.05, "D": 0.05, "E": 0.05, "F": 0.05, "G": 0.05, "H": 0.05, "I": 0.05,
    "K": 0.05, "L": 0.05, "M": 0.05, "N": 0.05, "P": 0.05, "Q": 0.05, "R": 0.05, "S": 0.05,
    "T": 0.05, "V": 0.05, "W": 0.05, "Y": 0.05
  },
  "residues": [
    {
      "position": 1,
      "index": 0,
      "amino_acid": "A",
      "molecular_weight": 89.1,
      "isoelectric_point": 0.345,
      "hydrophobicity": 1.8
    },
    {
      "position": 2,
      "index": 1,
      "amino_acid": "C",
      "molecular_weight": 121.2,
      "isoelectric_point": 0.345,
      "hydrophobicity": 2.5
    },
    ...
  ]
}

Code Explanation

ProteinAnalyzer Class

__init__(self, sequence)

  • Initializes the protein sequence.
  • Defines the amino acids to analyze.
  • Initializes a counter for each amino acid to 0.
  • Initializes a proportions dictionary for each amino acid to 0.

calculate_amino_acid_composition(self)

  • Calculates the number of each type of amino acid in the sequence.

calculate_amino_acid_proportions(self)

  • Calculates the proportion of each type of amino acid in the sequence and rounds it to 3 decimal places.

calculate_molecular_weight(self)

  • Calculates the total molecular weight of the protein based on a predefined weight table and rounds it to 2 decimal places.

calculate_isoelectric_point(self)

  • Estimates the isoelectric point (pI) of the protein using an algorithm based on the amino acid composition.

calculate_hydrophobicity(self)

  • Provides a measure of the overall hydrophobicity of the sequence using a predefined scale (e.g., Kyte-Doolittle scale) and rounds it to 2 decimal places.

calculate_residues_info(self)

  • Calculates detailed information for each residue, including molecular weight, isoelectric point, and hydrophobicity.

plot_amino_acid_composition(self, output_file)

  • Generates a bar plot for the amino acid composition.

main() Function

  1. ArgumentParser: Uses argparse to handle command-line arguments.

    • sequence: The protein sequence to analyze.
    • output: The name of the JSON output file.
    • --plot: (Optional) The name of the plot file where the amino acid composition will be saved.
  2. Sequence Analysis: Displays the received sequence and output file. Instantiates the ProteinAnalyzer class and calculates the insights. Displays the calculated insights.

  3. Saving Results: Saves the insights in a JSON file. Displays a confirmation or error message. If the --plot argument is provided, generates and saves the plot.

Authors

This script was created by Iman Jouiad.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

protein_aa_analyze-1.0.1.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

protein_aa_analyze-1.0.1-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file protein_aa_analyze-1.0.1.tar.gz.

File metadata

  • Download URL: protein_aa_analyze-1.0.1.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for protein_aa_analyze-1.0.1.tar.gz
Algorithm Hash digest
SHA256 3be52efd06572fab621bf48ae25faade233bfa88b72e086026f8d490fdc1e60f
MD5 a7943f3345c32724eecb39f33774c206
BLAKE2b-256 a71f96e4aa7a683e07d9ed948ef4511197dfe40b745c2f6044290c7ae4108e9c

See more details on using hashes here.

File details

Details for the file protein_aa_analyze-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for protein_aa_analyze-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2b96ca67154c305745abd91aef44730b4ae3f82b62487e4c3e1c406e891e6e2c
MD5 5c3fb37354d83e770c7167d31ae675e8
BLAKE2b-256 747020fa8cc6daec10e64a8441b083b928d2d983c7d011dd1a5b366b7cdf89d9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page