genome-format-converters

A collection of Python scripts for converting common bioinformatics file formats

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

Genome Format Converters

Author: Benjamin Narh-Madey

Affiliation: Hittinger Lab, Laboratory of Genetics, University of Wisconsin-Madison

Python 3.6+ License: MIT

A collection of Python scripts for converting common bioinformatics file formats. Each script follows a simple, uniform interface: you point it to an input directory, and it writes converted files to an output directory.

Features
Installation
Usage
Command Reference
Scripts Overview
License
Contributing

Features

Uniform interface: all scripts accept --input-dir and --output-dir arguments.
Batch processing: convert all files of a given type in a directory at once.
Lightweight: only requires a few well‑maintained Python libraries.
Well tested: each script has been tested on small example datasets.

Installation

Clone the repository: git clone https://github.com/K-nie/genome-format-converters.git cd genome-format-converters Install the required dependencies: pip install -r requirements.txt

Note: For scripts that work with BAM or VCF files, you also need pysam (included in requirements.txt).For BLAST tabular conversion, you need BLAST+ installed separately (optional – only if you generate the input files).

Usage All scripts are used in the same way. After installing the package (pip install genome-format-converters), users can run the tool from the command line using the gfc command followed by a subcommand. The general syntax is: gfc --input-dir INPUT_DIR --output-dir OUTPUT_DIR [options] The input directory should contain the files you want to convert. The output directory will be created if it doesn’t exist. Each subcommand processes all files with recognised extensions in the input directory.

Getting help

Run: gfc --help to see all available subcommands, or gfc --help for detailed options. Examples: Get help for a specific subcommand (e.g., gff3-to-gtf) type: gfc gff3-to-gtf --help

Command Reference

Annotation Format Conversions

Subcommand Description Example

gff3-to-gtf Convert GFF3 to GTF gfc gff3-to-gtf --input-dir ./gff_files --output-dir ./gtf_output
gff3-to-bed Convert GFF3 to 6‑column BED gfc gff3-to-bed --input-dir ./gff_files --output-dir ./bed_output
genbank-to-gff3 Convert GenBank to GFF3 gfc genbank-to-gff3 --input-dir ./gbk_files --output-dir ./gff3_output
gff3-to-table Convert GFF3 to tab‑separated feature table gfc gff3-to-table --input-dir ./gff_files --output-dir ./table_output
gff3-to-protein Extract protein sequences from GFF3 + FASTA gfc gff3-to-protein --input-dir ./data --output-dir ./proteins
fasta-gff-to-gbk Convert paired FASTA and GFF3 files to GenBank gfc fasta-gff-to-gbk --input-dir ./data --output-dir ./gbk_output

Sequence Format Conversions

Subcommand Description Example

fasta-to-fastq FASTA → FASTQ with default quality (I) gfc fasta-to-fastq --input-dir ./fasta --output-dir ./fastq
fastq-to-fasta FASTQ → FASTA (drop qualities) gfc fastq-to-fasta --input-dir ./fastq --output-dir ./fasta
fasta-qual-to-fastq Combine FASTA + QUAL into FASTQ gfc fasta-qual-to-fastq --input-dir ./data --output-dir ./fastq
fastq-to-fasta-qual Split FASTQ into FASTA and QUAL gfc fastq-to-fasta-qual --input-dir ./fastq --output-dir ./split
fasta-to-table FASTA → two‑column TSV (id, sequence) gfc fasta-to-table --input-dir ./fasta --output-dir ./tables
convert-alignment Convert alignment formats (fasta, phylip, nexus, clustal) gfc convert-alignment --input-dir ./aln --output-dir ./phylip --in-format fasta --out-format phylip

Alignment / Mapping Results

Subcommand Description Example

bam-to-bed Convert BAM/SAM to BED6 gfc bam-to-bed --input-dir ./bam_files --output-dir ./bed
blast-to-links Convert BLAST tabular (outfmt 6) to link TSV gfc blast-to-links --input-dir ./blast_results --output-dir ./links --min-length 100 --min-identity 30
delta-to-tab Convert MUMmer .delta to tabular coordinates gfc delta-to-tab --input-dir ./delta_files --output-dir ./tables
maf-to-xmfa Convert MAF to XMFA (progressiveMauve format) gfc maf-to-xmfa --input-dir ./maf_files --output-dir ./xmfa

Variant Formats (VCF)

Subcommand Description Example

vcf-to-bed Convert VCF to BED intervals gfc vcf-to-bed --input-dir ./vcf_files --output-dir ./bed
vcf-to-table Convert VCF to tab‑separated table gfc vcf-to-table --input-dir ./vcf_files --output-dir ./tables
vcf-to-consensus Create consensus FASTA from VCF + reference gfc vcf-to-consensus --input-dir ./data --output-dir ./consensus

Phylogenetic Tree Formats

Subcommand Description Example

tree-convert Convert tree formats (newick, nexus, phyloxml) gfc tree-convert --input-dir ./trees --output-dir ./converted --in-format newick --out-format nexus
annotate-tree Add alignment sequences to tree (output NEXUS) gfc annotate-tree --tree tree.nwk --aln alignment.fasta --output annotated.nex

Scripts Overview

The underlying Python scripts are located in src/genome_format_converters/converters/. Each script can also be run independently (though the gfc interface is recommended). Below is a quick reference of the scripts and their input/output formats.

Script Description Input extensions Output extension

gff3_to_gtf.py GFF3 → GTF .gff3, .gff .gtf
gff3_to_bed.py GFF3 → 6‑column BED .gff3, .gff .bed
genbank_to_gff3.py GenBank → GFF3 .gbk, .gb .gff3
gff3_to_table.py GFF3 → tab‑separated feature table .gff3, .gff .tsv
gff3_to_protein.py GFF3 + FASTA → protein FASTA .gff3/.gff + .fasta/.fa .faa
fasta_to_fastq.py FASTA → FASTQ (with default quality) .fasta, .fa, .fna,.fas .fastq
fastq_to_fasta.py FASTQ → FASTA .fastq, .fq .fasta
fasta_qual_to_fastq.py Combine FASTA + QUAL → FASTQ .fasta/.fa + .qual .fastq
fastq_to_fasta_qual.py Split FASTQ → FASTA + QUAL .fastq, .fq .fasta, .qual
convert_alignment.py Alignment format converter (FASTA, PHYLIP, NEXUS, CLUSTAL) any alignment file user‑specified
fasta_to_table.py FASTA → two‑column TSV (ID, sequence) .fasta, .fa, .fna, .fas .tsv
bam_to_bed.py BAM/SAM → BED6 .bam, .sam .bed
blast_tab_to_links.py BLAST tabular (outfmt 6) → simplified link TSV .tab .links.tsv
delta_to_tab.py MUMmer .delta → tabular alignment coordinates .delta .tsv
maf_to_xmfa.py MAF → XMFA (progressiveMauve format) .maf .xmfa
vcf_to_bed.py VCF/BCF → 1‑bp BED intervals .vcf, .vcf.gz, .bcf .bed
vcf_to_table.py VCF/BCF → tab‑separated table (TSV) .vcf, .vcf.gz, .bcf .tsv
vcf_to_consensus.py VCF + reference FASTA → consensus FASTA per sample .vcf/.vcf.gz + .fasta .fa
tree_convert.py Newick ↔ NEXUS ↔ PhyloXML .nwk, .nex, .xml user‑specified
annotate_tree.py Add alignment sequences to tree (NEXUS output) .nwk + .fasta (aligned) .nex
convert_all_gff_fasta_to_gbk.py FASTA+GFF → GenBank .fasta/.fa + .gff3/.gff .gbk

Testing

All scripts have been tested on small example datasets located in the tests/test_data/ directory. These test files cover the basic functionality of each converter. To run the tests yourself, install the package in development mode (pip install -e .) and execute the example commands from the Command Reference using the provided test data. For instance:

License

This project is licensed under the MIT License – see the LICENSE file for details.

Contributing

Contributions are welcome! If you have a new converter or an improvement, please open an issue or submit a pull request.

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

0.1.2

Feb 27, 2026

This version

0.1.1

Feb 26, 2026

0.1.0

Feb 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genome_format_converters-0.1.1.tar.gz (19.9 kB view details)

Uploaded Feb 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

genome_format_converters-0.1.1-py3-none-any.whl (28.6 kB view details)

Uploaded Feb 26, 2026 Python 3

File details

Details for the file genome_format_converters-0.1.1.tar.gz.

File metadata

Download URL: genome_format_converters-0.1.1.tar.gz
Upload date: Feb 26, 2026
Size: 19.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for genome_format_converters-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`52cd637fb65336bbd2792bbefb2160ef98f8234b8c3aaa8061a8e2f0d817a4ef`
MD5	`a5bfa4d43daf201e3d9fc42a9e376d99`
BLAKE2b-256	`06edb8f48e3f66aff88c1c7ce0405df46018b33a34a8479a6af5c0bd13933efc`

See more details on using hashes here.

File details

Details for the file genome_format_converters-0.1.1-py3-none-any.whl.

File metadata

Download URL: genome_format_converters-0.1.1-py3-none-any.whl
Upload date: Feb 26, 2026
Size: 28.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for genome_format_converters-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6d6f74721d9927f2da9756c1608c577c60bc23093abfcac0cbb6407ed26ad9b5`
MD5	`7f2ee251358372e7f91f93a56c672ae0`
BLAKE2b-256	`5a1519f1c724483f0b904af3f86c4d6e9da10bfc40fa17e66563976fe9db30db`

See more details on using hashes here.

genome-format-converters 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Genome Format Converters

Author: Benjamin Narh-Madey

Affiliation: Hittinger Lab, Laboratory of Genetics, University of Wisconsin-Madison

Table of Contents

Features

Installation

Getting help

Command Reference

Annotation Format Conversions

Subcommand Description Example

Sequence Format Conversions

Subcommand Description Example

Alignment / Mapping Results

Subcommand Description Example

Variant Formats (VCF)

Subcommand Description Example

Phylogenetic Tree Formats

Subcommand Description Example

Scripts Overview

Script Description Input extensions Output extension

Testing

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes