Skip to main content

Compute core epitopes from multiple overlapping peptides.

Project description

epicore

This tool is an adaption from plateau.

General purpose

The tool can be used to identify and quantify shared consensus epitopes.

Installation

Install with pip

pip install epicore

Install with bioconda

conda install bioconda::epicore

How to use

To compute the consensus epitopes enter the following command:

epicore --reference_proteome <PROTEOME_FILE> --out_dir <OUT_DIR> generate-epicore-csv --min_epi_length <MIN_EPI_LENGTH> --min_overlap <MIN_OVERLAP> --max_step_size <MAX_STEP_SIZE> --seq_column <SEQ_COLUMN> --protacc_column <PROTACC_COLUMN> --delimiter <DELIMITER> [--intensity_column <INTENSITY_COLUMN> --start_column <START_COLUMN> --end_column <END_COLUMN> --mod_pattern <MOD_PATTERN> --report --html] --evidence_file <EVIDENCE_FILE>

Replace EVIDENCE_FILE with the path to your evidence file and PROTEOME_FILE with the path to the proteome FASTA file, that was used to generate the evidence file. You can find more detailed information about the input data here.

To visualize the landscape of a protein you can use the following command:

epicore --reference_proteome <PROTEOME_FILE> --out_dir <OUT_DIR> plot-landscape --epicore_csv <EPICORE_RESULT> --protacc <PROTACC>

Replace EPICORE_RESULT with the file epicore_result.csv, which can be generated by using the generate-epicore-csv command.

Input

The description of each parameter can be found in the table below. Parameters enclosed in square brackets are optional. Parameters highlighted with 🟢 are necessary for the plot-landscape command. Parameters highlighted with 🔴 are necessary for the generate-epicore-csv command. The tool supports any output that contains a sequence and a protein accession column.

Parameter Description
🔴 max_step_size Defines the maximal step size between two peptides to still be grouped to the same epitope. If the start positions of two peptides differ by that number, the peptides are only grouped together if they overlap by a minimum of min_overlap amino acids.
🔴 min_overlap Defines the minimal overlap between two epitopes to still be grouped to the same epitope, if the start positions of the epitopes differ more than max_step_size.
🔴 min_epi_length Defines the minimum epitope length. This is the minimal length a core epitope has to have. If for a epitope the whole sequence of the epitope is shorter than the minimum epitope length, the core will be defined as the whole sequence.
🔴 seq_column Defines the column header in the input evidence file that contains the peptide sequences.
🔴 protacc_column Defines the column header in the input evidence file that contains the protein accessions of proteins that contain the peptide of the row.
[start_column] This is an optional parameter. It defines the column header in the input evidence file that contains the start position of the peptide in the different proteins. Setting this parameter reduces the runtime.
[end_column] This is an optional parameter. It defines the column header in the input evidence file that contains the end position of the peptide in the different proteins. Setting this parameter reduces the runtime.
[intensity_column] This is an optional parameter. It defines the column header in the input evidence file of the column that contains the intensity of a peptide sequence.
🔴 out_dir Defines the directory in which the results will be saved.
[mod_pattern] Defines how modifications of a peptide are separated from the sequence in the sequence column. Provide a comma-separated string here, where the element before the comma specifies the start of a modification and the element after the comma defines the end of a modification in the sequences of the sequence column. If the sequences in the sequence column include modifications they are separated by delimiters. In AAAPAIM/+15.99\SY for example the modification is separated by / and \ . The mod_pattern parameter should be /,\ in that case. All parts of a sequence inside () and [] are interpreted as modifications by default. If these delimiters are used in your input file, you do not need to provide a mod_delimiter parameter.
🔴 delimiter Defines the delimiter that separates multiple values in one cell in the input evidence file.
[report] If set a report gets generated.
[html] If set to a html version of the generated plots gets computed.
🟢 protacc Defines the proteins for which the core epitopes and landscape should be visualized. Separate multiple parameters with commas.

evidence file

The evidence file is the output file of a search engine. The following file types are supported: csv, tsv, xlsx.

proteome file

The proteome file should contain the proteome used for the identification of the peptide sequences. The file should follow the FASTA format.

Output files

The generate-epicore-csv command results in three csv files (epitopes.csv, epicore_result.csv, pep_cores_mapping.csv), two plots (epitope_intensity_his.svg, length_distributions.svg) and one optional html report.

The plot-landscape command results in protein landscape visualizations. One example can be found here. The number of plots is defined by the number of accessions provided in the params.yaml file.

epitopes.csv

The csv contains one epitope per row.

column description
whole_epitopes The sequence of the entire epitope.
consensus_epitopes The sequence of the core epitope.
landscape The landscape of the epitope.
grouped_peptides_sequence A list containing the peptide sequences that contribute to the epitope.
relative_core_intensity The relative core intensity of the epitope. The relative intensity of an epitope is the intensity of the epitope divided by the sum of all intensities in the provided evidence file.
core_epitopes_intensity The total core intensity of the epitope. The intensity of an epitope is computed as the sum of the intensities of peptides that contribute to that epitope.
accession A list containing the accessions of proteins in which the epitope occurs.

epicore_result.csv

The csv contains one protein per row. The different columns contain the following information:

column description
accession The protein accession.
sequence A list of sequences of peptides mapped to the protein.
start A list containing the start positions of the peptides in the protein.
end A list containing the end positions of the peptides in the protein.
grouped peptides start The start positions of all peptides grouped together to epitopes.
grouped peptides end The end positions of all peptides grouped together to epitopes.
grouped peptides sequence The peptide sequences that contribute to the same epitope grouped together.
sequence group mapping A list mapping each peptide onto it's epitope.
core_epitopes_intensity A list containing the intensity of each epitope. The intensity of an epitope is computed as the sum of the intensities of peptides that contribute to that epitope.
relative_core_intensity A list containing the relative intensity of each epitope. The relative intensity of an epitope is the intensity of the epitope divided by the sum of all intensities in the provided evidence file.
landscape A list containing the landscapes of each epitope.
whole epitopes A list containing the whole epitopes.
core epitopes A list containing the core epitopes.
core epitopes start A list containing the start positions of the cores in the protein.
core epitopes end A list containing the start positions of the cores in the protein.

pep_cores_mapping.csv

The pep_cores_mapping.csv contains all the information from the initial evidence file. In addition there are the following columns:

column description
whole_epitopes A list of all sequences of epitopes to which the peptide of the row contributes.
core_epitopes A list of all core sequences of epitopes to which the peptide of the row contributes.
proteome_occurrence A list containing protein accessions and sequence positions at which the core epitope occurs in the proteome.
total_core_intensity A list containing the intensity of each epitope. The intensity of an epitope is computed as the sum of the intensities of peptides that contribute to that epitope.
relative_core_intensity A list containing the relative intensity of each epitope. The relative intensity of an epitope is the intensity of the epitope divided by the sum of all intensities in the provided evidence file.

epitope_intensity_hist.svg

The plot visualizes how many peptides contribute to a core epitope. An example epitope_intensity_hist plot

length_distributions.svg

The plot visualizes the length distribution of the original peptides and the computed core epitopes. An example length_distributions plot

report.html

The report file summarizes some of the results. Among other things it includes two histograms visualizing the peptide and epitope length distribution and shows the ten epitopes with the highest number of mapped peptides.

landscape visualization

An example landscape visualization of a protein generated with the plot-landscape command: An example landscape of the protein sp|P62736|ACTA_HUMAN The height indicates how many peptides are mapped to a position in the proteome. The different colors indicate different epitopes. Lighter areas of a color indicate how many peptides are associated with the epitope. The more intense region indicate the core epitope.

Workflow

  1. Identification of the location of all peptides in the proteome.
  2. Group peptides whose start position does not differ by more than max_step_size amino acids or whose overlap is larger than min_overlap. max_step_size and min_overlap are parameters that can be specified by the user.
  3. Identify epitope sequences,V as the sequence of each peptide group.
  4. For each peptide sequence, identify the core epitope sequence. The core epitope sequence is defined as the sequence region that has the highest peptide mapping count while having a minimum length of min_epi_length amino acids.

Citation

Epicore is an adaption from the tool developed by Álvaro-Benito et al.[1].
[1] Álvaro-Benito, Miguel, et al. "Quantification of HLA-DM-dependent major histocompatibility complex of class II immunopeptidomes by the peptide landscape antigenic epitope alignment utility." Frontiers in immunology 9 (2018): 872.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

epicore-0.1.5.tar.gz (20.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

epicore-0.1.5-py3-none-any.whl (24.1 kB view details)

Uploaded Python 3

File details

Details for the file epicore-0.1.5.tar.gz.

File metadata

  • Download URL: epicore-0.1.5.tar.gz
  • Upload date:
  • Size: 20.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for epicore-0.1.5.tar.gz
Algorithm Hash digest
SHA256 7fde2fe172554c8cdf8d302dd75b7b0f514f6f92745b93355b4a1ab24ce35b49
MD5 54ac9b25eec87ce6cbef664b85a58a4a
BLAKE2b-256 3a36c0487996d49f88cb57c99711a7b7637d77fc65ecbfaa7378dfdf0f8975a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for epicore-0.1.5.tar.gz:

Publisher: release.yml on AG-Walz/epicore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file epicore-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: epicore-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 24.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for epicore-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 54ee6b7c534d843332b217ed4c452805e976e93d0d02f615df65a841bb6fb9d2
MD5 fc930de277d45fd8ec6d7a3c2b599357
BLAKE2b-256 594075933bd2e7a39bccc11767044bd85467bf7eda3be210c0ca88f33db0b381

See more details on using hashes here.

Provenance

The following attestation bundles were made for epicore-0.1.5-py3-none-any.whl:

Publisher: release.yml on AG-Walz/epicore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page