Skip to main content

A simple Protein Amino-Acid Analyzer CLI application

Project description

Protein Insight Generator

Overview

The Protein Insight Generator is a command-line application that takes a protein sequence in string format as input and returns a JSON object containing various insights about the sequence.

Features

  • Amino Acid Composition: Counts the number of each type of amino acid.
  • Proportions: Calculates the proportion of each amino acid in the sequence.
  • Molecular Weight: Calculates the total molecular weight of the protein.
  • Isoelectric Point (pI): Estimates the isoelectric point of the protein.
  • Hydrophobicity: Provides a measure of the overall hydrophobicity of the sequence.
  • Detailed Residue Information: Provides molecular weight, isoelectric point, and hydrophobicity for each amino acid residue in the sequence.
  • Amino Acid Composition Plot: Generates a plot of the amino acid composition.

Implementation Steps

  1. Input Handling: Accepts a protein sequence as input from the command line.
  2. Amino Acid Composition Calculation: Uses a dictionary to count occurrences of each amino acid.
  3. Proportions Calculation: Calculates the proportion of each amino acid in the sequence.
  4. Molecular Weight Calculation: Sums the weights of individual amino acids using a predefined weight table.
  5. Isoelectric Point Calculation: Uses an algorithm to estimate the pI based on the amino acid composition.
  6. Hydrophobicity Calculation: Sums the hydrophobicity values of individual amino acids using a predefined scale (e.g., Kyte-Doolittle scale).
  7. Residue Information Calculation: Calculates detailed information for each residue, including molecular weight, isoelectric point, and hydrophobicity.
  8. Plot Generation: Generates a bar plot for the amino acid composition.

Usage

Prerequisites

Make sure you have Python 3.x installed on your machine.

Installation

Clone this repository or download the protein_aa_analyze.py file directly.

Install the required dependencies: pip install matplotlib

Execution

To run the script, use the following command: python protein_aa_analyze.py "PROTEIN_SEQUENCE" OUTPUT_FILE.json [--plot OUTPUT_PLOT.png]

  • PROTEIN_SEQUENCE: The protein sequence to analyze (e.g., "ACDEFGHIKLMNPQRSTVWY").
  • OUTPUT_FILE.json: The name of the JSON file where the results will be saved.
  • --plot OUTPUT_PLOT.png: (Optional) The name of the plot file where the amino acid composition will be saved.

Example

python protein_aa_analyze.py "ACDEFGHIKLMNPQRSTVWY" output.json --plot composition_plot.png

This will analyze the sequence "ACDEFGHIKLMNPQRSTVWY", save the insights in a file named output.json, and save the amino acid composition plot in composition_plot.png.

Output Format

The JSON output will have the following structure

{
  "sequence": "ACDEFGHIKLMNPQRSTVWY",
  "protein_molecular_weight": {
    "value": 2373.4,
    "unit": "Daltons",
    "reference": "https://www.sigmaaldrich.com/BE/en/technical-documents/technical-article/protein-biology/protein-structural-analysis/amino-acid-reference-chart"
  },
  "protein_isoelectric_point": {
    "value": 6.8,
    "unit": "pH",
    "reference": "https://www.sigmaaldrich.com/BE/en/technical-documents/technical-article/protein-biology/protein-structural-analysis/amino-acid-reference-chart"
  },
  "protein_hydrophobicity": {
    "value": -5.7,
    "unit": "Arbitrary units with The Kyte-Doolittle scale",
    "reference": "https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/650/Hydrophobicity_scales.html#:~:text=The%20Kyte%2DDoolittle%20scale%20is,on%20the%20window%20size%20used."
  },
  "amino_acid_composition": {
    "A": 1, "C": 1, "D": 1, "E": 1, "F": 1, "G": 1, "H": 1, "I": 1, "K": 1, "L": 1,
    "M": 1, "N": 1, "P": 1, "Q": 1, "R": 1, "S": 1, "T": 1, "V": 1, "W": 1, "Y": 1
  },
  "amino_acid_proportions": {
    "A": 0.05, "C": 0.05, "D": 0.05, "E": 0.05, "F": 0.05, "G": 0.05, "H": 0.05, "I": 0.05,
    "K": 0.05, "L": 0.05, "M": 0.05, "N": 0.05, "P": 0.05, "Q": 0.05, "R": 0.05, "S": 0.05,
    "T": 0.05, "V": 0.05, "W": 0.05, "Y": 0.05
  },
  "residues": [
    {
      "position": 1,
      "index": 0,
      "amino_acid": "A",
      "molecular_weight": 89.1,
      "isoelectric_point": 0.345,
      "hydrophobicity": 1.8
    },
    {
      "position": 2,
      "index": 1,
      "amino_acid": "C",
      "molecular_weight": 121.2,
      "isoelectric_point": 0.345,
      "hydrophobicity": 2.5
    },
    ...
  ]
}

Code Explanation

ProteinAnalyzer Class

__init__(self, sequence)

  • Initializes the protein sequence.
  • Defines the amino acids to analyze.
  • Initializes a counter for each amino acid to 0.
  • Initializes a proportions dictionary for each amino acid to 0.

calculate_amino_acid_composition(self)

  • Calculates the number of each type of amino acid in the sequence.

calculate_amino_acid_proportions(self)

  • Calculates the proportion of each type of amino acid in the sequence and rounds it to 3 decimal places.

calculate_molecular_weight(self)

  • Calculates the total molecular weight of the protein based on a predefined weight table and rounds it to 2 decimal places.

calculate_isoelectric_point(self)

  • Estimates the isoelectric point (pI) of the protein using an algorithm based on the amino acid composition.

calculate_hydrophobicity(self)

  • Provides a measure of the overall hydrophobicity of the sequence using a predefined scale (e.g., Kyte-Doolittle scale) and rounds it to 2 decimal places.

calculate_residues_info(self)

  • Calculates detailed information for each residue, including molecular weight, isoelectric point, and hydrophobicity.

plot_amino_acid_composition(self, output_file)

  • Generates a bar plot for the amino acid composition.

main() Function

  1. ArgumentParser: Uses argparse to handle command-line arguments.

    • sequence: The protein sequence to analyze.
    • output: The name of the JSON output file.
    • --plot: (Optional) The name of the plot file where the amino acid composition will be saved.
  2. Sequence Analysis: Displays the received sequence and output file. Instantiates the ProteinAnalyzer class and calculates the insights. Displays the calculated insights.

  3. Saving Results: Saves the insights in a JSON file. Displays a confirmation or error message. If the --plot argument is provided, generates and saves the plot.

Example Output

If you run the script with the sequence "ACDEFGHIKLMNPQRSTVWY" and the file name output.json, the contents of the output.json file might look like this:

{
  "sequence": "ACDEFGHIKLMNPQRSTVWY",
  "protein_molecular_weight": {
    "value": 2373.4,
    "unit": "Daltons",
    "reference": "https://www.sigmaaldrich.com/BE/en/technical-documents/technical-article/protein-biology/protein-structural-analysis/amino-acid-reference-chart"
  },
  "protein_isoelectric_point": {
    "value": 6.8,
    "unit": "pH",
    "reference": "https://www.sigmaaldrich.com/BE/en/technical-documents/technical-article/protein-biology/protein-structural-analysis/amino-acid-reference-chart"
  },
  "protein_hydrophobicity": {
    "value": -5.7,
    "unit": "Arbitrary units with The Kyte-Doolittle scale",
    "reference": "https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/650/Hydrophobicity_scales.html#:~:text=The%20Kyte%2DDoolittle%20scale%20is,on%20the%20window%20size%20used."
  },
  "amino_acid_composition": {
    "A": 1, "C": 1, "D": 1, "E": 1, "F": 1, "G": 1, "H": 1, "I": 1, "K": 1, "L": 1,
    "M": 1, "N": 1, "P": 1, "Q": 1, "R": 1, "S": 1, "T": 1, "V": 1, "W": 1, "Y": 1
  },
  "amino_acid_proportions": {
    "A": 0.05, "C": 0.05, "D": 0.05, "E": 0.05, "F": 0.05, "G": 0.05, "H": 0.05, "I": 0.05,
    "K": 0.05, "L": 0.05, "M": 0.05, "N": 0.05, "P": 0.05, "Q": 0.05, "R": 0.05, "S": 0.05,
    "T": 0.05, "V": 0.05, "W": 0.05, "Y": 0.05
  },
  "residues": [
    {
      "position": 1,
      "index": 0,
      "amino_acid": "A",
      "molecular_weight": 89.1,
      "isoelectric_point": 0.345,
      "hydrophobicity": 1.8
    },
    {
      "position": 2,
      "index": 1,
      "amino_acid": "C",
      "molecular_weight": 121.2,
      "isoelectric_point": 0.345,
      "hydrophobicity": 2.5
    },
    ...
  ]
}

Code Explanation

ProteinAnalyzer Class

__init__(self, sequence)

  • Initializes the protein sequence.
  • Defines the amino acids to analyze.
  • Initializes a counter for each amino acid to 0.
  • Initializes a proportions dictionary for each amino acid to 0.

calculate_amino_acid_composition(self)

  • Calculates the number of each type of amino acid in the sequence.

calculate_amino_acid_proportions(self)

  • Calculates the proportion of each type of amino acid in the sequence and rounds it to 3 decimal places.

calculate_molecular_weight(self)

  • Calculates the total molecular weight of the protein based on a predefined weight table and rounds it to 2 decimal places.

calculate_isoelectric_point(self)

  • Estimates the isoelectric point (pI) of the protein using an algorithm based on the amino acid composition.

calculate_hydrophobicity(self)

  • Provides a measure of the overall hydrophobicity of the sequence using a predefined scale (e.g., Kyte-Doolittle scale) and rounds it to 2 decimal places.

calculate_residues_info(self)

  • Calculates detailed information for each residue, including molecular weight, isoelectric point, and hydrophobicity.

plot_amino_acid_composition(self, output_file)

  • Generates a bar plot for the amino acid composition.

main() Function

  1. ArgumentParser: Uses argparse to handle command-line arguments.

    • sequence: The protein sequence to analyze.
    • output: The name of the JSON output file.
    • --plot: (Optional) The name of the plot file where the amino acid composition will be saved.
  2. Sequence Analysis: Displays the received sequence and output file. Instantiates the ProteinAnalyzer class and calculates the insights. Displays the calculated insights.

  3. Saving Results: Saves the insights in a JSON file. Displays a confirmation or error message. If the --plot argument is provided, generates and saves the plot.

Authors

This script was created by Iman Jouiad.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

protein_aa_analyze-1.0.0.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

protein_aa_analyze-1.0.0-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file protein_aa_analyze-1.0.0.tar.gz.

File metadata

  • Download URL: protein_aa_analyze-1.0.0.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for protein_aa_analyze-1.0.0.tar.gz
Algorithm Hash digest
SHA256 87bb08abcc04de172e3739f26cb719db318cfbd4c96980f7532adf8f4e0560a6
MD5 d7d49b3e87d917701698dff450c733bf
BLAKE2b-256 0b31ef8142b5acba18bdc30c7fd43db0a1f619ea120b5d39357165b4982e55dd

See more details on using hashes here.

File details

Details for the file protein_aa_analyze-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for protein_aa_analyze-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e3ef533e004a7653e850f1a45c547235e1a0696024a179d61f3a2d0c26b6059a
MD5 9633d6820c540fbb3022ea61bd913e96
BLAKE2b-256 8a382a93899f40296f8bdd530cea64161be338893e92fa02f3a3713f903dc5a7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page