A collection of Python scripts for converting common bioinformatics file formats
Project description
Genome Format Converters
Author: Benjamin Narh-Madey
Affiliation: Hittinger Lab, Laboratory of Genetics, University of Wisconsin-Madison
A collection of Python scripts for converting common bioinformatics file formats. Each script follows a simple, uniform interface: you point it to an input directory, and it writes converted files to an output directory.
Table of Contents
Features
- Uniform interface: all scripts accept
--input-dirand--output-dirarguments. - Batch processing: convert all files of a given type in a directory at once.
- Lightweight: only requires a few well‑maintained Python libraries.
- Well tested: each script has been tested on small example datasets.
Installation
Clone the repository: git clone https://github.com/K-nie/genome-format-converters.git cd genome-format-converters Install the required dependencies: pip install -r requirements.txt
Note: For scripts that work with BAM or VCF files, you also need pysam (included in requirements.txt).For BLAST tabular conversion, you need BLAST+ installed separately (optional – only if you generate the input files).
Usage All scripts are used in the same way. After installing the package (pip install genome-format-converters), users can run the tool from the command line using the gfc command followed by a subcommand. The general syntax is: gfc --input-dir INPUT_DIR --output-dir OUTPUT_DIR [options] The input directory should contain the files you want to convert. The output directory will be created if it doesn’t exist. Each subcommand processes all files with recognised extensions in the input directory.
Getting help
Run: gfc --help to see all available subcommands, or gfc --help for detailed options. Examples: Get help for a specific subcommand (e.g., gff3-to-gtf) type: gfc gff3-to-gtf --help
Command Reference
Annotation Format Conversions
Subcommand Description Example
- gff3-to-gtf -------------------------- Convert GFF3 to GTF ------------------------------------------ gfc gff3-to-gtf --input-dir ./gff_files --output-dir ./gtf_output
- gff3-to-bed -------------------------- Convert GFF3 to 6‑column BED ---------------------------------- gfc gff3-to-bed --input-dir ./gff_files --output-dir ./bed_output
- genbank-to-gff3 ----------------------- Convert GenBank to GFF3 ------------------------------------- gfc genbank-to-gff3 --input-dir ./gbk_files --output-dir ./gff3_output
- gff3-to-table ------------------------- Convert GFF3 to tab‑separated feature table ------------------ gfc gff3-to-table --input-dir ./gff_files --output-dir ./table_output
- gff3-to-protein ---------------------- Extract protein sequences from GFF3 + FASTA ------------------ gfc gff3-to-protein --input-dir ./data --output-dir ./proteins
- fasta-gff-to-gbk --------------------- Convert paired FASTA and GFF3 files to GenBank -------------- gfc fasta-gff-to-gbk --input-dir ./data --output-dir ./gbk_output
Sequence Format Conversions
Subcommand Description Example
- fasta-to-fastq ------------------------ FASTA → FASTQ with default quality (I) ------------------------ gfc fasta-to-fastq --input-dir ./fasta --output-dir ./fastq
- fastq-to-fasta ------------------------ FASTQ → FASTA (drop qualities) ------------------------------- gfc fastq-to-fasta --input-dir ./fastq --output-dir ./fasta
- fasta-qual-to-fastq ------------------ Combine FASTA + QUAL into FASTQ --------------------------- gfc fasta-qual-to-fastq --input-dir ./data --output-dir ./fastq
- fastq-to-fasta-qual ------------------ Split FASTQ into FASTA and QUAL ----------------------------- gfc fastq-to-fasta-qual --input-dir ./fastq --output-dir ./split
- fasta-to-table ---------------------- FASTA → two‑column TSV (id, sequence) ------------------------- gfc fasta-to-table --input-dir ./fasta --output-dir ./tables
- convert-alignment --------------------- Convert alignment formats (fasta, phylip, nexus, clustal) ----- gfc convert-alignment --input-dir ./aln --output-dir ./phylip --in-format fasta --out-format phylip
Alignment / Mapping Results
Subcommand Description Example
- bam-to-bed ------------------------- Convert BAM/SAM to BED6 ------------------------------------ gfc bam-to-bed --input-dir ./bam_files --output-dir ./bed
- blast-to-links ---------------------- Convert BLAST tabular (outfmt 6) to link TSV --------------- gfc blast-to-links --input-dir ./blast_results --output-dir ./links --min-length 100 --min-identity 30
- delta-to-tab ----------------------- Convert MUMmer .delta to tabular coordinates --------------- gfc delta-to-tab --input-dir ./delta_files --output-dir ./tables
- maf-to-xmfa ----------------------- Convert MAF to XMFA (progressiveMauve format) --------------- gfc maf-to-xmfa --input-dir ./maf_files --output-dir ./xmfa
Variant Formats (VCF)
Subcommand Description Example
- vcf-to-bed ------------------------- Convert VCF to BED intervals ------------------------------ gfc vcf-to-bed --input-dir ./vcf_files --output-dir ./bed
- vcf-to-table ------------------------- Convert VCF to tab‑separated table --------------------------- gfc vcf-to-table --input-dir ./vcf_files --output-dir ./tables
- vcf-to-consensus -------------------- Create consensus FASTA from VCF + reference ----------------- gfc vcf-to-consensus --input-dir ./data --output-dir ./consensus
Phylogenetic Tree Formats
Subcommand Description Example
- tree-convert ------------------------ Convert tree formats (newick, nexus, phyloxml) --------------- gfc tree-convert --input-dir ./trees --output-dir ./converted --in-format newick --out-format nexus
- annotate-tree ---------------------- Add alignment sequences to tree (output NEXUS) --------------- gfc annotate-tree --tree tree.nwk --aln alignment.fasta --output annotated.nex
Testing
All scripts have been tested on small example datasets located in the tests/test_data/ directory. These test files cover the basic functionality of each converter. To run the tests yourself, install the package in development mode (pip install -e .) and execute the example commands from the Command Reference using the provided test data. For instance:
License
This project is licensed under the MIT License – see the LICENSE file for details.
Contributing
Contributions are welcome! If you have a new converter or an improvement, please open an issue or submit a pull request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file genome_format_converters-0.1.2.tar.gz.
File metadata
- Download URL: genome_format_converters-0.1.2.tar.gz
- Upload date:
- Size: 18.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e0a104d016b971d06864a7239a9f473dac1d5e52bd06b83ca68c37b339130dc6
|
|
| MD5 |
ad3413df8434fd727f0e78793690ba60
|
|
| BLAKE2b-256 |
0758dfd0d1d44d2db651cf401d9e1a800116d6836c768c0e68adb6a64e536700
|
File details
Details for the file genome_format_converters-0.1.2-py3-none-any.whl.
File metadata
- Download URL: genome_format_converters-0.1.2-py3-none-any.whl
- Upload date:
- Size: 27.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68952466dd97a277134d1c2a890a1ef317a12be9ba256c7408b6fc6fd9b3457f
|
|
| MD5 |
5898a99feee9eb687f600371d82de3e7
|
|
| BLAKE2b-256 |
8b176f64c1063e81a223c1020d1a45dcfbe1860d008c984633b0fd7a8fd25692
|