Skip to main content

Read, write, and manipulate SnapGene files (.dna, .rna, .prot)

Project description

SnapGene File Format Parser

SnapGene File Format Parser (SGFFP for short) is a reverse-engineered parser for SnapGene DNA, RNA, and protein file formats.

[!Important] Found an unknown block type? Run sff check your_file.dna -l and look for [NEW] markers. Please report them in #1 with a dump (sff check your_file.dna -d). Help us decode more blocks!

The parser reads SnapGene files into Python objects and exports to JSON, with a writer for creating new SnapGene files.

The project aims to be a minimalistic, fast, and useful tool for molecular biologists who need to parse large libraries of SnapGene files, or for developers building SnapGene-compatible applications.

Architecture

flowchart LR
    subgraph Input
        DNA[".dna file"]
        Bytes["bytes/stream"]
    end

    subgraph SGFFP
        Reader["SgffReader"]
        Object["SgffObject"]
        Writer["SgffWriter"]
    end

    subgraph Output
        JSON["JSON"]
        File[".dna file"]
    end

    DNA --> Reader
    Bytes --> Reader
    Reader --> Object
    Object --> Writer
    Object --> JSON
    Writer --> File

Installation

pip install sgffp

Or with uv:

uv add sgffp

For development:

git clone https://github.com/merv1n34k/sgffp.git
cd sgffp
uv sync --all-extras

Quick Start

from sgffp import SgffReader, SgffWriter

# Read a SnapGene file
sgff = SgffReader.from_file("plasmid.dna")

# Access data via typed properties
print(sgff.sequence.value)
print(sgff.features[0].name)

# Modify and write back
sgff.sequence.topology = "circular"
SgffWriter.to_file(sgff, "output.dna")

CLI Tool

uv run sff check plasmid.dna    # Inspect file blocks
uv run sff parse plasmid.dna    # Export to JSON
uv run sff info plasmid.dna     # Show file information

File Format

SnapGene uses a Type-Length-Value (TLV) binary format where each block contains:

Field Size Description
Type 1 byte Block type identifier
Length 4 bytes Payload size (big-endian)
Data N bytes Block payload

Data encoding varies by block type: UTF-8 for sequences, XML for annotations, 2-bit encoding for compressed DNA (GATC → 00/01/10/11), and LZMA compression for history blocks.

Block Types

All known SnapGene block types and their encoding formats:

ID Block Type Format ID Block Type Format
0 DNA Sequence UTF-8 16 Trace Container Binary + TLV
1 Compressed DNA 2-bit GATC 17 Alignable Sequences XML
5 Primers XML 18 Sequence Trace ZTR
6 Notes XML 21 Protein Sequence UTF-8
7 History Tree LZMA + XML 29 History Modifier LZMA + XML
8 Sequence Properties XML 30 History Content LZMA + TLV
10 Features XML 32 RNA Sequence UTF-8
11 History Nodes Binary + TLV

Block 18 (ZTR trace) only appears inside block 16 containers, never as a standalone top-level block. For a complete binary format reference, see SNAPGENE_FORMAT_SPEC.md.

Supported Block Types

The table below shows which block types can be read from and written to SnapGene files. Blocks marked with a Model have typed Python classes for convenient access (e.g., sgff.sequence, sgff.features, sgff.history).

ID Block Type Read Write Model
0 DNA Sequence + + +
1 Compressed DNA + + +
5 Primers (XML) + + +
6 Notes (XML) + + +
7 History Tree (XML) + + +
8 Sequence Properties (XML) + + +
10 Features (XML) + + +
11 History Nodes + + +
16 Trace Container + + +
17 Alignable Sequences (XML) + + +
18 ZTR Trace (in block 16) + + +
21 Protein Sequence + + +
29 History Modifier (XML) + + +
30 History Content (Nested) + + +
32 RNA Sequence + + +

Roadmap

  • Improve SGFF parsing, unify TLV strategy
  • Understand whole file structure
  • Correctly parse into readable format from all common blocks
  • Create writer for supported block types
  • Add comprehensive test suite (240 tests)
  • Parse XML into pure JSON format
  • Add write support for history blocks (LZMA compression)
  • Add typed model classes for easy data access
  • Documentation improvements

Acknowledgments

This project would not have been possible without previous work done by

License

Distributed under MIT licence, see LICENSE for more.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sgffp-0.14.0.tar.gz (63.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sgffp-0.14.0-py3-none-any.whl (33.8 kB view details)

Uploaded Python 3

File details

Details for the file sgffp-0.14.0.tar.gz.

File metadata

  • Download URL: sgffp-0.14.0.tar.gz
  • Upload date:
  • Size: 63.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sgffp-0.14.0.tar.gz
Algorithm Hash digest
SHA256 7db914d531fdb1868fae0813db840e6dc7015c93a24a41e8b5c28f11526f8a92
MD5 9a4b11bb79974f63f70455aa8b494228
BLAKE2b-256 f0207f67b636ef83d2c38fc5beae7d25f500943a96e642314dee6b04a112916b

See more details on using hashes here.

Provenance

The following attestation bundles were made for sgffp-0.14.0.tar.gz:

Publisher: publish.yml on merv1n34k/sgffp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sgffp-0.14.0-py3-none-any.whl.

File metadata

  • Download URL: sgffp-0.14.0-py3-none-any.whl
  • Upload date:
  • Size: 33.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sgffp-0.14.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c2c989681d0352cb1a610be926e2eb06a1050ee8b3b607bd92e180742974fc0c
MD5 872b01236950df658212c25880888356
BLAKE2b-256 17bd4586c09e625a1f06b0461e8e97150e7dcd2a6df54b6fd1f06c17280b7679

See more details on using hashes here.

Provenance

The following attestation bundles were made for sgffp-0.14.0-py3-none-any.whl:

Publisher: publish.yml on merv1n34k/sgffp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page