MetaPont: A tool to bridge the gap between metagenomic tool output and its analysis.
Project description
MetaPont
MetaPont - A tool to bridge the gap between the output of metagenomic tools and the analysis of the data
MetaPont is designed to work specifically with the output files generated by the HuwsLab Metagenome Workflow (github.com/TheHuwsLab/Metagenomic_Workflow) \
Directory structure most follow the format generated by the workflow).
.
└── samples
├── E1.0
│ ├── E1.0_eggnog_mapper
│ │ ├── E1.0_eggnogmapper.success
│ │ ├── E1.0_pyrodigal_eggnog_mapped.emapper.annotations
│ │ ├── E1.0_pyrodigal_eggnog_mapped.emapper.annotations.xlsx
│ │ ├── E1.0_pyrodigal_eggnog_mapped.emapper.decorated.gff
│ │ ├── E1.0_pyrodigal_eggnog_mapped.emapper.hits
│ │ └── E1.0_pyrodigal_eggnog_mapped.emapper.seed_orthologs
│ ├── E1.0_kraken2
│ │ ├── E1.0_kraken2_report_mpa.txt
│ │ ├── E1.0_kraken2_report.txt
│ │ └── E1.0_kraken2.txt
│ └── E1.0_readmapped
│ ├── E1.0_bowtie2_db.3.bt2
│ ├── E1.0_readmapped_cds_summary.txt
│ ├── E1.0_readmapped_cds_summary.txt_new
│ └── E1.0_readmapped_contig_summary.txt
├── E2.0
│ ├── E2.0_eggnog_mapper
│ │ ├── E2.0_eggnogmapper.success
│ │ ├── E2.0_pyrodigal_eggnog_mapped.emapper.annotations
│ │ ├── E2.0_pyrodigal_eggnog_mapped.emapper.annotations.xlsx
│ │ ├── E2.0_pyrodigal_eggnog_mapped.emapper.decorated.gff
│ │ ├── E2.0_pyrodigal_eggnog_mapped.emapper.hits
│ │ └── E2.0_pyrodigal_eggnog_mapped.emapper.seed_orthologs
│ ├── E2.0_kraken2
│ │ ├── E2.0_kraken2_report_mpa.txt
│ │ ├── E2.0_kraken2_report.txt
│ │ └── E2.0_kraken2.txt
│ └── E2.0_readmapped
│ ├── E2.0_bowtie2_db.3.bt2
│ ├── E2.0_readmapped_cds_summary.txt
│ └── E2.0_readmapped_contig_summary.txt
├── E3.0
│ ├── E3.0_eggnog_mapper
│ │ ├── E3.0_eggnogmapper.success
│ │ ├── E3.0_pyrodigal_eggnog_mapped.emapper.annotations
│ │ ├── E3.0_pyrodigal_eggnog_mapped.emapper.annotations.xlsx
│ │ ├── E3.0_pyrodigal_eggnog_mapped.emapper.decorated.gff
│ │ ├── E3.0_pyrodigal_eggnog_mapped.emapper.hits
│ │ └── E3.0_pyrodigal_eggnog_mapped.emapper.seed_orthologs
│ ├── E3.0_kraken2
│ │ ├── E3.0_kraken2_report_mpa.txt
│ │ ├── E3.0_kraken2_report.txt
│ │ └── E3.0_kraken2.txt
│ └── E3.0_readmapped
│ ├── E3.0_readmapped_cds_summary.txt
│ └── E3.0_readmapped_contig_summary.txt
└── Per_Sample_Contig_Outputs
├── E1.0_Contigs.tsv
├── E2.0_Contigs.tsv
└── E3.0_Contigs.tsv
Features - These are the current aims of this project - Still under development
- Targeted Functional Analysis: Search for specific functional IDs (e.g., GO terms) within the
_Final_Contig.tsvfiles provided by the HuwsLab Metagenome Workflow (https://github.com/TheHuwsLab/Metagenome_Workflow) . - Taxonomic Breakdown: Extract genus-level taxonomy information and calculate their proportions in the dataset.
- Batch Processing: Analyse all
_Contig.tsvfiles in a specified directory. - Customisable Output: Save results in a format suitable for downstream analysis.
Installation
Prerequisites
Ensure you have the following installed:
- Python ~3.10 or later
Installation via pip
MetaPont is provided as a pip distribution.
pip install MetaPont
Usage
MetaPont-Combine (or metapont-combine) Aggregate results from emapper kraken and read mapping for each sample
MetaPont-Combine -h
usage: MetaPont_Combine.py [-h] -d PARENT_DIRECTORY_PATH [-p PREFIX]
MetaPont: Combine emapper-kraken-reads
options:
-h, --help show this help message and exit
-d PARENT_DIRECTORY_PATH, --parent_directory_path PARENT_DIRECTORY_PATH
Directory containing sample directories to analyse.
-p PREFIX, --prefix PREFIX
Default - 'PN': Default directory name prefix to
identify sample directories to analyse..
The output will be saved in a new directory called Per_Sample_Contig_Outputs within the specified parent directory.
See Per_Sample_Contig_Outputs for example output files.
Contig-Coverage-Summary: Generate contig coverage summary from read mapping outputs
Contig-Coverage-Summary -h
usage: Contig_Coverage_Summary.py [-h] --root_dir ROOT_DIR --prefix PREFIX
[--read-length READ_LENGTH]
[--output OUTPUT]
MetaPont v0.0.9- Contig-Coverage-Summary: Aggregate readmapping contig
summaries and compute overview stats per sample.
options:
-h, --help show this help message and exit
--root_dir ROOT_DIR, -d ROOT_DIR
Root directory containing sample folders (use
`root_dir` path).
--prefix PREFIX, -p PREFIX
Comma-separated directory tags to search for (e.g.
E,L,P).
--read-length READ_LENGTH, -r READ_LENGTH
Optional average read length to compute estimated
coverage.
--output OUTPUT, -o OUTPUT
Output CSV path (default:
`root_dir/readmap_overview.csv`).
Report-Contig-Lineage: Generate contig lineage report from kraken2 outputs
Report-Contig-Lineage -h
usage: Report_Contig_Lineage.py [-h] -d DIR_PATH [--output OUTPUT]
[-s SEPARATE_TAXA] [-r REMOVE_TAXA]
MetaPont v0.0.9- Reporter-Contig-Lineage: Report contig lineage read counts
across samples, grouping by specified taxa substrings.
Required Arguments:
-d DIR_PATH Define the directory path containing the files
Optional Arguments:
-s SEPARATE_TAXA, --separate-taxa SEPARATE_TAXA
Comma-separated list of taxa to separate (e.g.
d__Bacteria,d__Archaea). If omitted, defaults are
used.
-r REMOVE_TAXA, --remove-taxa REMOVE_TAXA
Comma-separated list of taxa to remove. If omitted,
defaults are used.
Extract-By-Function Command-line Arguments
Extract-By-Function -h
usage: Extract_By_Function.py [-h] -d DIRECTORY -f FUNCTION_ID -o OUTPUT
[-m MIN_PROPORTION] [-top TOP_TAXA]
MetaPont v0.0.9: Extract-By-Function - Identify taxa contributing to a
specific function.
options:
-h, --help show this help message and exit
-d DIRECTORY, --directory DIRECTORY
Directory containing TSV files to analyse.
-f FUNCTION_ID, --function_id FUNCTION_ID
Specific function ID to search for (e.g.,
'GO:0016597').
-o OUTPUT, --output OUTPUT
Output file to save results.
-m MIN_PROPORTION, --min_proportion MIN_PROPORTION
Minimum proportion threshold for taxa to be included
in the output.
-top TOP_TAXA, --top_taxa TOP_TAXA
Top n taxa to be included in the output.
The Extract-By-Function tool provides several command-line options:
Note: Either -m or -top is required.
| Option | Description | Required | Default |
|---|---|---|---|
-d, --directory |
Directory containing _Final_Contig.tsv files to analyse. |
Yes | None |
-f, --function_id |
Functional ID to search for (e.g., GO:0016597). |
Yes | None |
-m, --min_proportion |
Minimum proportion needed for reporting. | Yes/No | None |
-top, --top_taxa |
Number of taxa to report. | Yes/No | None |
-o, --output |
Output file name to save results. | Yes | None |
Example
To search for the functional ID GO:0016597 in all _Final_Contig.tsv files within the test_data/ directory:
Extract-By-Function -d .../test_data/Final_contig/ -f GO:0016597 -top 3 -o .../test_data/Final_Contig/Extract_By_Function_Out/results.tsv
Output
The tool generates a tab-delimited output file with the following columns:
- Sample: Name of the processed Sample.
- Taxa: Genus-level taxonomic assignment extracted from the
Lineagecolumn. - Reads Assigned (Function): Number of reads assigned to contigs with the given functional ID.
- Proportion: Proportion of reads assigned to contigs of stated Taxa with the given functional ID within the sample.
- Proportion (Total Reads): Proportion of reads assigned to contigs of stated Taxa with the given functional ID within the total reads of the sample.
Example output:
Function ID: GO:0016597
Sample Taxa Reads Assigned (Function) Proportion (Function) Proportion (Total Reads)
PN0536_0001_S1_Final_Contig.tsv Lactobacillus 111963 0.602 0.004
PN0536_0003_S83_Final_Contig.tsv Lactobacillus 20072 0.457 0.001
PN0536_0002_S2_Final_Contig.tsv Acutalibacter 145222 0.795 0.005
PN0536_0004_S3_Final_Contig.tsv Lactobacillus 40076 0.404 0.002
Extract-Function-By-Taxa:
usage: Extract_Function_By_Taxa.py [-h] -d DIRECTORY -t TAXON -f FUNCTION -o
OUTPUT
MetaPont: Extract Reads Proportions for a Specific Taxon and Function
options:
-h, --help show this help message and exit
-d DIRECTORY, --directory DIRECTORY
Directory containing TSV files to analyse.
-t TAXON, --taxon TAXON
Target taxon to search for (e.g., 'g__Escherichia').
-f FUNCTION, --function FUNCTION
Target function to extract (e.g., 'EC:2.7.11.1').
-o OUTPUT, --output OUTPUT
Output file to save results.
Workflow - unfinished
- The script reads
_Final_Contig.tsvfiles from the specified directory. - For each file, it searches for occurrences of the given functional ID within specific columns.
- Matches are associated with genus-level taxonomic information extracted from the
Lineagecolumn. - Taxa proportions are calculated and saved to the output file.
Extract-By-Taxa Command-line Arguments
Extract-By-Taxa -h
usage: Extract_By_Taxa.py [-h] -d DIRECTORY -t TAXON -o OUTPUT -func
FUNCTIONAL_CLASSES [-top TOP_FUNCTIONS]
MetaPont: Extract Top Functions by Taxon
options:
-h, --help show this help message and exit
-d DIRECTORY, --directory DIRECTORY
Directory containing TSV files to analyse.
-t TAXON, --taxon TAXON
Target taxon to search for (e.g., 'g__Bacillus').
-o OUTPUT, --output OUTPUT
Output file to save results.
-func FUNCTIONAL_CLASSES, --functional_classes FUNCTIONAL_CLASSES
Which functional classes to report (e.g. GO,EC,KEGG
etc).
-top TOP_FUNCTIONS, --top_functions TOP_FUNCTIONS
Top n functions to include in the output for each
sample (default: 3).
The Extract-By-Taxa tool provides several command-line options:
| Option | Description | Required | Default |
|---|---|---|---|
-d, --directory |
Directory containing _Fincal_Contig.tsv files to analyse. |
Yes | None |
-t, --taxon |
Taxa to search for (e.g., g__Bacillus). |
Yes | None |
-func, --functional_classes |
Functional classes to report (e.g. GO,EC,KEGG etc). | Yes | None |
-top, --top_taxa |
Number of functions to report (default 3). | No | None |
-o, --output |
Output file name to save results. | Yes | None |
Example
To search for the top reported functions for taxon g__Bacillus in all _Final_Contig.tsv files within the test_data/ directory:
Extract-By-Taxa -d .../test_data/Final_Contig -t g__Bacillus -o .../test_data/Final_Contig/Extract_By_Taxa/results.tsv -func GO
Output
The tool generates a tab-delimited output file with the following columns:
- Sample: Name of the processed Sample.
- Function: Reported 'top' function.
- Num of Assignments (Functions): Number of times the function has been assigned across all contigs reported as chosen Taxon.
Example output:
Selected Taxon: g__Bacillus
Sample Function Num of Assignments
PN0536_0001_S1 GO:0008150 296
PN0536_0001_S1 GO:0003674 285
PN0536_0001_S1 GO:0005575 254
PN0536_0003_S83 GO:0005575 45
PN0536_0003_S83 GO:0008150 44
PN0536_0003_S83 GO:0003674 43
PN0536_0002_S2 GO:0005575 5
PN0536_0002_S2 GO:0008150 5
PN0536_0002_S2 GO:0005623 4
PN0536_0004_S3 GO:0008150 4
PN0536_0004_S3 GO:0003674 3
PN0536_0004_S3 GO:0005488 3
Large File Handling (Might be a failure point)
The script uses csv.field_size_limit to handle exceptionally large .tsv files.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file metapont-0.0.10.tar.gz.
File metadata
- Download URL: metapont-0.0.10.tar.gz
- Upload date:
- Size: 60.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e277c59758b49ae7646ca50a83ecc05c5397cd9f4f87466501b4338a5fc2088
|
|
| MD5 |
bf6d6a608a7a27927d73abbd9941c641
|
|
| BLAKE2b-256 |
97e8ace5a9b5847554ac12d9f89638613c88f236ad795685bf209e07ce7b0a07
|
File details
Details for the file metapont-0.0.10-py3-none-any.whl.
File metadata
- Download URL: metapont-0.0.10-py3-none-any.whl
- Upload date:
- Size: 51.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5824e66f00886db956bc4ef813c1ae75f024143ceff7d92ecc95b768321e3dea
|
|
| MD5 |
ecb8485e11bc7e245b00d4cd775470a6
|
|
| BLAKE2b-256 |
c60ab8fdd4ab8fe44aebe2758d8d0b8c03f40c51184cfe6d541b6340318d4d93
|