Skip to main content

Compute core epitopes from multiple overlapping peptides.

Project description

epicore

This tool is an adaption from plateau.

General purpose

Epicore can be used to identify shared consensus epitopes.

Installation

Install with pip

pip install epicore

Follow the conda docs to install conda.
Install with bioconda

conda install bioconda::epicore

How to use

To compute the consensus epitopes enter the following command:

epicore --reference_proteome <PROTEOME_FILE> --out_dir <OUT_DIR> generate-epicore-csv --min_epi_length <MIN_EPI_LENGTH> --min_overlap <MIN_OVERLAP> --max_step_size <MAX_STEP_SIZE> --seq_column <SEQ_COLUMN> --protacc_column <PROTACC_COLUMN> --delimiter <DELIMITER> --evidence_file <EVIDENCE_FILE> --start_column <START_COLUMN> --end_column <END_COLUMN> --sample_column <SAMPLE_COLUMN>--condition_column <CONDITION_COLUMN> [--strict] [--included] [--mapping] [--qc] [--max_group_len <MAX_GROUP_LEN>]

Replace <EVIDENCE_FILE> with the path to your evidence file and <PROTEOME_FILE> with the path to a protein database containing all protein accessions included in the evidence file. You can find more detailed information about the input data here.

To visualize the landscape of a protein you can use the following command:

epicore --reference_proteome <PROTEOME_FILE> --out_dir <OUT_DIR> plot-landscape --epicore_csv <EPICORE_RESULT> --protacc <PROTACC>

Replace <EPICORE_RESULT> with the file epicore_result.csv, which can be generated by using the generate-epicore-csv command and <PROTEOME_FILE> with the protein database that was specified in the generate-epicore-csv command. Specify the protein you want to visualize after --protacc. Ensure that the protein accession is enclosed by quotation marks.

Input

The description of each parameter can be found in the table below. Parameters enclosed in square brackets are optional. Parameters highlighted with 🟢 belong to the plot-landscape command. Parameters highlighted with 🔴 are part of the generate-epicore-csv command. The tool supports any output that contains a sequence, protein accession, sample and condition column.

Parameter Description
🔴 max_step_size Parameter for the default mode. Two peptides with a distance below the specified value are always assigned to one group.
🔴 min_overlap Parameter for all modes. It specifies the minimal overlap between two peptides to be assigned to the same group.
🔴 min_epi_length Parameter for all modes. It specifies the minimal length of identified consensus sequences.
🔴 seq_column Defines the column header in the input evidence file that contains the peptide sequences.
🔴 protacc_column Defines the column header in the input evidence file that contains the protein accessions of proteins that contain the peptide of the row.
[🔴 start_column] Defines the column header in the input evidence file that contains the start positions of the peptide of the row. Providing both the start and end column speeds up the computation time.
[🔴 end_column] Defines the column header in the input evidence file that contains the end positions of the peptide of the row.
🔴 sample_column Defines the column header in the input evidence file that contains the sample of the peptide of the row.
🔴 condition_column Defines the column header in the input evidence file that contains the condition of the peptide of the row.
🔴 out_dir Specifies the output directory.
🔴 delimiter Defines the delimiter that separates multiple values in one cell in the input evidence file.
[🔴 strict] If set a strict version is run. The strict version ensures the defined minimal overlap is given between all peptides in a peptide group. Peptides shorter than the specified minimal overlap are an exception.
[🔴 included] If set a all peptides that are included in the protein region of an other peptide group are added to that peptide group.
[🔴 qc] If the two QC plots consensus_sequence_coverage.png and intern_extern.svg, are generated.
[🔴 mapping] If set the pep_cores_mapping file is generated.
[🔴 html] If set to a html version of the generated plots gets computed.
[🔴 max_group_len] If specified the parameter ensures that all peptide groups have a length below the specified threshold. Peptides exceeding the defined length are an exception.
🟢 protacc Proteins for which a consensus landscape is visualized. Separate multiple parameters with commas.

evidence file

The evidence file is the output file of a search engine. The following file types are supported: csv, tsv, xlsx.

proteome file

The proteome file should contain all proteins that appear in the protein accession column of the evidence file. The file has to be in FASTA format.

Output files

results of generate-epicore-csv
  |_consensus_sequence_coverage.png (optional)
  |_epicore.log
  |_epicore_result.csv
  |_epitope_intensity_hist.svg
  |_epitopes.csv
  |_intern_extern.svg (optional)
  |_length_distributions.svg
  |_pep_cores_mapping.tsv (optional)

The plot-landscape command results in protein landscape visualizations. One example can be found here.

epicore.log

The log file contains information about the run. It lists all peptides that were removed since their proteins do not appear in the reference proteome. It also includes the number of identified peptide groups, specified parameters and the epicore version.

epitopes.csv

The csv contains one epitope per row.

column description
whole_epitopes The sequence of the entire peptide group.
consensus_epitopes The identified consensus sequence.
landscape The landscape of the epitope.
grouped_peptides_sequence A list containing the peptide sequences that contribute to peptide group.
grouped_peptides_sample A list containing the samples of the peptides from the grouped_peptides_sequence column.
grouped_peptides_condition A list containing the conditions of the peptides from the grouped_peptides_sequence column.
grouped_peptides_start A list containing the start positions of the peptides from the grouped_peptides_sequence column.
grouped_peptides_end A list containing the end positions of the peptides from the grouped_peptides_sequence column.
core_epitopes_start The start position of the consensus sequence.
core_epitopes_end The end position of the consensus sequence.
accession A list containing the accessions of proteins in which the peptide group occurs.

epicore_result.csv

The csv contains one protein per row. The different columns contain the following information:

column description
accession The protein accession.
sequence A list of sequences of peptides mapped to the protein.
start A list containing the start positions of the peptides in the protein.
end A list containing the end positions of the peptides in the protein.
peptide_index A list containing the row number of the peptides in the evidence file.
sample A list containing the sample of the peptides in the protein.
condition A list containing the condition of the peptides in the protein.
grouped peptides start The start positions of all peptides summarized into groups.
grouped peptides end The end positions of all peptides summarized into groups.
grouped peptides sequence The peptide sequences summarized into peptide groups.
grouped peptides sample The samples of all peptides summarized into groups.
grouped peptides condition The conditions of all peptides summarized into groups.
landscape A list containing the landscapes of each peptide group.
whole_epitopes A list containing the entire sequences of all peptide groups in the protein.
consensus_epitopes A list containing the consensus sequences of all peptide groups in the protein.
core_epitopes_start A list containing the start positions of the consensus sequences in the protein.
core_epitopes_end A list containing the end positions of the consensus sequences in the protein.
proteome_occurrence A list containing the occurrences of the consensus sequences in the protein.

pep_cores_mapping.csv

The pep_cores_mapping.csv contains all the information from the initial evidence file. In addition there are the following columns:

column description
entire_epitope_sequence A list of all sequences of epitopes to which the peptide of the row contributes.
consensus_epitope_sequence A list of all consensus sequences of epitopes to which the peptide of the row contributes.
proteome_occurrence A list containing protein accessions and sequence positions at which the consensus epitope sequence occurs in the proteome.

consensus_sequence_coverage.svg

A histogram that visualizes the consensus sequence coverage of all peptides in the input. The consensus sequence coverage is defined for each peptide in a peptide group as the fraction of the consensus sequence corresponding to the peptide group that is covered by the peptide.

epitope_intensity_hist.svg

The plot visualizes how many peptides contribute to each peptide group.

length_distributions.svg

The plot visualizes the length distribution of the original peptides and the computed peptide groups.

intern_extern.svg

The plot visualizes for each peptide the intern versus the extern ratio. The intern ratio is defined as the maximal overlap of a peptide with a peptides within the same group. The extern ratio is defined as the maximal overlap of a peptide with the peptides of an adjacent peptide group.

landscape visualization

An example landscape visualization of a protein generated with the plot-landscape command:

The height indicates how many peptides are mapped to a position in the proteome. The different colors indicate different epitopes. Lighter areas of a color indicate how many peptides are associated with the epitope. The more intense region indicate the core epitope.

Workflow

  1. Compute the positions of the peptides in the proteome.
  2. Group peptides based on their overlap. The following three modes are available: Strict, Included and Loose. The strict mode requires all peptides that are assigned to the same group to have a minimal shared overlap of min_overlap. When the included mode is specified the peptide groups are extended by also adding peptides to the groups that are completely covered by the peptide group region. The loose modes groups peptides, when their start positions do not differ more than max_step_size or when their overlap is larger than min_overlap. max_step_size and min_overlap are parameters that can be specified by the user.
  3. Refine the peptide groups by splitting the peptide groups at positions where their landscape has a minimum. This is only part of the loose mode.
  4. Identify the consensus sequences of each peptide group. The consensus sequence is defined as the sequence region that has the highest landscape value while having a minimum length of min_epi_length. min_epi_length is a parameter that can be specified by the user.

Citation

Epicore is an adaption from the tool developed by Álvaro-Benito et al.[1].
[1] Álvaro-Benito, Miguel, et al. "Quantification of HLA-DM-dependent major histocompatibility complex of class II immunopeptidomes by the peptide landscape antigenic epitope alignment utility." Frontiers in immunology 9 (2018): 872.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

epicore-1.0.0.tar.gz (29.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

epicore-1.0.0-py3-none-any.whl (32.6 kB view details)

Uploaded Python 3

File details

Details for the file epicore-1.0.0.tar.gz.

File metadata

  • Download URL: epicore-1.0.0.tar.gz
  • Upload date:
  • Size: 29.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for epicore-1.0.0.tar.gz
Algorithm Hash digest
SHA256 78370bb45d829b8c9beebbf847203688965e8fb24e0e820e0221e1496a248ed1
MD5 513af4b012a62889be547a85c2c59077
BLAKE2b-256 9de538ef1266673c90d40dcc8f49227ceea070e1b345805542593c8656f6711c

See more details on using hashes here.

Provenance

The following attestation bundles were made for epicore-1.0.0.tar.gz:

Publisher: release.yml on AG-Walz/epicore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file epicore-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: epicore-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 32.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for epicore-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0c679d0a5a6a48a666c934e2a4fa41f459b352e206acac37aecde157033a4688
MD5 047aeefcf6c86df767579d7071f11a1d
BLAKE2b-256 bce7b97b96631065d1a77ef134485e2479af487d89fba63d8576c6b490fc07c1

See more details on using hashes here.

Provenance

The following attestation bundles were made for epicore-1.0.0-py3-none-any.whl:

Publisher: release.yml on AG-Walz/epicore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page