Read, write, and manipulate SnapGene .dna files

These details have not been verified by PyPI

Project links

Project description

SnapGene File Format Parser

SnapGene File Format Parser (SGFFP for short) is a reverse-engineered parser for SnapGene DNA, RNA, and protein file formats.

[!Important] Hey! I have tried to decode as many different SnapGene blocks as I can, but surely something must be missing. This is why I ask you to check your SnapGene file(s) with uv run sff check <your_snapgene_file> to see which blocks your file has. If you have a new, unknown block type it will notify you with [NEW] flag Please open an issue and, if possible, either attach your file or dump the output of the block with the --examine/-e flag, i.e. uv run sff check <your_snapgene_file> -e 1> block.dump. Let's make parsing SnapGene files better together!

The parser reads SnapGene files into Python objects and exports to JSON, with a writer for creating new SnapGene files.

The project aims to be a minimalistic, fast, and useful tool for molecular biologists who need to parse large libraries of SnapGene files, or for developers building SnapGene-compatible applications.

Architecture

flowchart LR
    subgraph Input
        DNA[".dna file"]
        Bytes["bytes/stream"]
    end

    subgraph SGFFP
        Reader["SgffReader"]
        Object["SgffObject"]
        Writer["SgffWriter"]
    end

    subgraph Output
        JSON["JSON"]
        File[".dna file"]
    end

    DNA --> Reader
    Bytes --> Reader
    Reader --> Object
    Object --> Writer
    Object --> JSON
    Writer --> File

Installation

git clone https://github.com/merv1n34k/sgffp.git
cd sgffp
uv sync

Quick Start

from sgffp import SgffReader, SgffWriter

# Read a SnapGene file
sgff = SgffReader.from_file("plasmid.dna")

# Access data via typed properties
print(sgff.sequence.value)
print(sgff.features[0].name)

# Modify and write back
sgff.sequence.topology = "circular"
SgffWriter.to_file(sgff, "output.dna")

CLI Tool

uv run sff check plasmid.dna    # Inspect file blocks
uv run sff parse plasmid.dna    # Export to JSON
uv run sff info plasmid.dna     # Show file information

File Format

SnapGene uses a Type-Length-Value (TLV) binary format where each block contains:

Field	Size	Description
Type	1 byte	Block type identifier
Length	4 bytes	Payload size (big-endian)
Data	N bytes	Block payload

Data encoding varies by block type: UTF-8 for sequences, XML for annotations, 2-bit encoding for compressed DNA (GATC → 00/01/10/11), and LZMA compression for history blocks.

Block Types

All known SnapGene block types and their encoding formats:

ID	Block Type	Format	ID	Block Type	Format
0	DNA Sequence	UTF-8	17	Alignable Sequences	XML
1	Compressed DNA	2-bit GATC	18	Sequence Trace	ZTR
5	Primers	XML	21	Protein Sequence	UTF-8
6	Notes	XML	28	Enzyme Visualization	XML
7	History Tree	LZMA + XML	29	History Modifier	LZMA + XML
8	Sequence Properties	XML	30	History Content	LZMA + TLV
10	Features	XML	32	RNA Sequence	UTF-8
11	History Nodes	Binary + TLV	14	Custom Enzymes	XML

Blocks not listed (2-4, 9, 12-13, 15-16, 19-20, 22-27, 31) are either unknown or internal SnapGene data.

Supported Block Types

The table below shows which block types can be read from and written to SnapGene files. Blocks marked with a Model have typed Python classes for convenient access (e.g., sgff.sequence, sgff.features, sgff.history).

ID	Block Type	Read	Write	Model
0	DNA Sequence	+	+	+
1	Compressed DNA	+	+	+
5	Primers (XML)	+	+	+
6	Notes (XML)	+	+	+
7	History Tree (XML)	+	+	+
8	Sequence Properties (XML)	+	+	+
10	Features (XML)	+	+	+
11	History Nodes	+	+	+
14	Custom Enzymes (XML)	+	+	-
17	Alignable Sequences (XML)	+	+	+
21	Protein Sequence	+	+	+
28	Enzyme Visualization (XML)	+	+	-
29	History Modifier (XML)	+	+	+
30	History Content (Nested)	+	+	+
32	RNA Sequence	+	+	+

Roadmap

Improve SGFF parsing, unify TLV strategy
Understand whole file structure
Correctly parse into readable format from all common blocks
Create writer for supported block types
Add comprehensive test suite (199 tests)
Parse XML into pure JSON format
Add write support for history blocks (LZMA compression)
Add typed model classes for easy data access
Documentation improvements

Acknowledgments

This project would not have been possible without previous work done by

Damien Goutte-Gattat, see his PDF on SGFF structure: https://incenp.org/dvlpt/docs/binary-sequence-formats/binary-sequence-formats.pdf
Isaac Luo, for his version of SnapGene reader: https://github.com/IsaacLuo/SnapGeneFileReader

License

Distributed under MIT licence, see LICENSE for more.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.17.1

May 21, 2026

0.17.0

May 20, 2026

0.16.0

May 19, 2026

0.15.4

Mar 26, 2026

0.15.3

Mar 26, 2026

0.15.2

Mar 25, 2026

0.15.1

Mar 19, 2026

0.15.0

Mar 9, 2026

0.14.0

Feb 16, 2026

0.13.1

Feb 2, 2026

0.13.0

Feb 1, 2026

0.11.0

Feb 1, 2026

0.10.0

Jan 26, 2026

This version

0.9.0

Jan 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sgffp-0.9.0.tar.gz (112.1 kB view details)

Uploaded Jan 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sgffp-0.9.0-py3-none-any.whl (25.0 kB view details)

Uploaded Jan 26, 2026 Python 3

File details

Details for the file sgffp-0.9.0.tar.gz.

File metadata

Download URL: sgffp-0.9.0.tar.gz
Upload date: Jan 26, 2026
Size: 112.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for sgffp-0.9.0.tar.gz
Algorithm	Hash digest
SHA256	`4544bee2acae1d65c172fef93c33515176567a4800c6ebab4da8822748ada278`
MD5	`cb70c156b339c7fbf057136ced18a537`
BLAKE2b-256	`5c1619c9b09fc3fa6c04b1a4b56eae0ee1ba7b1da04fcd2bf0d12356ce90fe31`

See more details on using hashes here.

File details

Details for the file sgffp-0.9.0-py3-none-any.whl.

File metadata

Download URL: sgffp-0.9.0-py3-none-any.whl
Upload date: Jan 26, 2026
Size: 25.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for sgffp-0.9.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`90122a8ba9cca2a194199d033a0c2ee613ed145a5011f430685281e102fe8961`
MD5	`5406c2b0a9a213a8af9f314dcc044626`
BLAKE2b-256	`67993715b3ef8c8d67f43fc3f55c9d9e562483efae34b06d936d0028e3459e3a`

See more details on using hashes here.

sgffp 0.9.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SnapGene File Format Parser

Architecture

Installation

Quick Start

CLI Tool

File Format

Block Types

Supported Block Types

Roadmap

Acknowledgments

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes