No project description provided
Project description
Pansa
Pansa is a command-line tool and a Python package to build a syntelog-based pan-genome matrix.
Overview
- Extract Longest Protein: Extract the longest amino acid sequence for each gene from the DNA sequence file using the GFF3 annotation file for subsequent synteny identification.
- Run DAGchainer: Perform synteny analysis to identify syntenic gene pairs.
- Merge Syntelog Results: Merge syntenic results to generate the final pan-genome matrix.
- Polish Matrix: Generate a more readable and simplified pan-genome matrix, and export it in various formats (PNG, SVG, PDF).
- Other Functions: Includes reading parameters from configuration files, controlling log output levels, and more.
Installation
You can install Pansa using pip:
pip install pansa
Usage
It is recommended to create a new folder to run the processes in this software to avoid unnecessary bugs.
Extract Longest Protein
pansa extract_longest_protein <config_file> --verbose <ERROR|INFO|DEBUG>
Run DAGchainer
pansa blastp_2_dagchainer <config_file> --thread <number_of_threads> --D <distance> --g <gap_length> --A <aligned_pairs> --evalue <evalue_threshold> --verbose <ERROR|INFO|DEBUG>
Merge Syntelog Results
pansa merge <config_file> --output <output_file> --verbose <ERROR|INFO|DEBUG>
Polish Matrix
pansa polish <config_file> --pan_matrix <pan_matrix_file> --outpng --outsvg --outpdf --verbose <ERROR|INFO|DEBUG>
Input File Format
Configuration File Format
The configuration file should contain the following format:
<sample_name> <fasta_path> <gff3_path>
For example:
Mo17 data/Zm-Mo17-REFERENCE-CAU-2.0.fa data/Zm-Mo17-REFERENCE-CAU-2.0_Zm00014ba.gff3
B73 data/Zm-B73-REFERENCE-NAM-5.0.fa data/Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.gff3
Oh7B data/Zm-Oh7B-REFERENCE-NAM-1.0.fa data/Zm-Oh7B-REFERENCE-NAM-1.0_Zm00038ab.1.gff3
or:
Mo17 Mo17/Zm-Mo17-REFERENCE-CAU-2.0.fa Mo17/Zm-Mo17-REFERENCE-CAU-2.0_Zm00014ba.gff3
B73 B73/Zm-B73-REFERENCE-NAM-5.0.fa B73/Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.gff3
Oh7B Oh7B/Zm-Oh7B-REFERENCE-NAM-1.0.fa Oh7B/Zm-Oh7B-REFERENCE-NAM-1.0_Zm00038ab.1.gff3
Example
Suppose you have a configuration file config.txt with the following content:
sample1 /data/sample1.fasta /data/sample1.gff3
sample2 /data/sample2.fasta /data/sample2.gff3
- Extract the longest protein sequence:
pansa extract_longest_protein config.txt --verbose INFO
- Run DAGchainer:
pansa blastp_2_dagchainer config.txt --thread 8 --D 1000000 --g 40000 --A 5 --evalue 1e-5 --verbose INFO
- Merge syntelog results:
pansa merge config.txt --output SG_test --verbose INFO
- Polish the matrix:
pansa polish config.txt --pan_matrix SG_test --outpng --outsvg --outpdf --verbose INFO
Log Output Levels
You can control the log output level using the --verbose parameter. The available options are: CRITICAL, ERROR, WARNING, INFO, DEBUG, NOTSET.
Developer Information
For more details and technical support, please contact the developers or visit the project homepage.
E-mail:caocao@cau.edu.cn
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pansg-0.1.2.tar.gz.
File metadata
- Download URL: pansg-0.1.2.tar.gz
- Upload date:
- Size: 11.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.0 CPython/3.9.18 Linux/6.5.0-35-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91184c75aadcc1ec9077bac3fc26b4ae36e67b35ae4b734d8c166894530e202d
|
|
| MD5 |
4b1c2540e7172239db1a9d4a59cadbe7
|
|
| BLAKE2b-256 |
8e2805c5c2a5a0fcb5a2360e8e6c7d746049d68e89a79f89100f929a0517322e
|
File details
Details for the file pansg-0.1.2-py3-none-any.whl.
File metadata
- Download URL: pansg-0.1.2-py3-none-any.whl
- Upload date:
- Size: 13.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.0 CPython/3.9.18 Linux/6.5.0-35-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
326b01c874398abb9592c5abbbcafe65247163619d43f54d0a8aeab54368ee1c
|
|
| MD5 |
300939f8b0f908a7ce7953e7bf4779c5
|
|
| BLAKE2b-256 |
2d95902b7093062cdaca2dc3959cfa67b585a3b592db437b2f0916bad44714da
|