Skip to main content

Read, write, and manipulate SnapGene files (.dna, .rna, .prot)

Project description

SnapGene File Format Parser

SnapGene File Format Parser (SGFFP for short) is a reverse-engineered parser for SnapGene DNA, RNA, and protein file formats.

[!Important] Found an unknown block type? Run sff check your_file.dna -l and look for [NEW] markers. Please report them in #1 with a dump (sff check your_file.dna -d). Help us decode more blocks!

The parser reads SnapGene files into Python objects and exports to JSON, with a writer for creating new SnapGene files.

The project aims to be a minimalistic, fast, and useful tool for molecular biologists who need to parse large libraries of SnapGene files, or for developers building SnapGene-compatible applications.

Architecture

flowchart LR
    subgraph Input
        DNA[".dna file"]
        Bytes["bytes/stream"]
    end

    subgraph SGFFP
        Reader["SgffReader"]
        Object["SgffObject"]
        Writer["SgffWriter"]
    end

    subgraph Output
        JSON["JSON"]
        File[".dna file"]
    end

    DNA --> Reader
    Bytes --> Reader
    Reader --> Object
    Object --> Writer
    Object --> JSON
    Writer --> File

Installation

pip install sgffp

Or with uv:

uv add sgffp

For development:

git clone https://github.com/merv1n34k/sgffp.git
cd sgffp
uv sync --all-extras

Quick Start

from sgffp import SgffReader, SgffWriter

# Read a SnapGene file
sgff = SgffReader.from_file("plasmid.dna")

# Access data via typed properties
print(sgff.sequence.value)
print(sgff.features[0].name)

# Modify and write back
sgff.sequence.topology = "circular"
SgffWriter.to_file(sgff, "output.dna")

CLI Tool

uv run sff check plasmid.dna    # Inspect file blocks
uv run sff parse plasmid.dna    # Export to JSON
uv run sff info plasmid.dna     # Show file information

File Format

SnapGene uses a Type-Length-Value (TLV) binary format where each block contains:

Field Size Description
Type 1 byte Block type identifier
Length 4 bytes Payload size (big-endian)
Data N bytes Block payload

Data encoding varies by block type: UTF-8 for sequences, XML for annotations, 2-bit encoding for compressed DNA (GATC → 00/01/10/11), and LZMA compression for history blocks.

Block Types

All known SnapGene block types and their encoding formats:

ID Block Type Format ID Block Type Format
0 DNA Sequence UTF-8 17 Alignable Sequences XML
1 Compressed DNA 2-bit GATC 18 Sequence Trace ZTR
5 Primers XML 21 Protein Sequence UTF-8
6 Notes XML 28 Enzyme Visualization XML
7 History Tree LZMA + XML 29 History Modifier LZMA + XML
8 Sequence Properties XML 30 History Content LZMA + TLV
10 Features XML 32 RNA Sequence UTF-8
11 History Nodes Binary + TLV 14 Custom Enzymes XML

Blocks not listed (2-4, 9, 12-13, 15-16, 19-20, 22-27, 31) are either unknown or internal SnapGene data.

Supported Block Types

The table below shows which block types can be read from and written to SnapGene files. Blocks marked with a Model have typed Python classes for convenient access (e.g., sgff.sequence, sgff.features, sgff.history).

ID Block Type Read Write Model
0 DNA Sequence + + +
1 Compressed DNA + + +
5 Primers (XML) + + +
6 Notes (XML) + + +
7 History Tree (XML) + + +
8 Sequence Properties (XML) + + +
10 Features (XML) + + +
11 History Nodes + + +
14 Custom Enzymes (XML) + + -
17 Alignable Sequences (XML) + + +
21 Protein Sequence + + +
28 Enzyme Visualization (XML) + + -
29 History Modifier (XML) + + +
30 History Content (Nested) + + +
32 RNA Sequence + + +

Roadmap

  • Improve SGFF parsing, unify TLV strategy
  • Understand whole file structure
  • Correctly parse into readable format from all common blocks
  • Create writer for supported block types
  • Add comprehensive test suite (199 tests)
  • Parse XML into pure JSON format
  • Add write support for history blocks (LZMA compression)
  • Add typed model classes for easy data access
  • Documentation improvements

Acknowledgments

This project would not have been possible without previous work done by

License

Distributed under MIT licence, see LICENSE for more.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sgffp-0.13.0.tar.gz (107.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sgffp-0.13.0-py3-none-any.whl (33.2 kB view details)

Uploaded Python 3

File details

Details for the file sgffp-0.13.0.tar.gz.

File metadata

  • Download URL: sgffp-0.13.0.tar.gz
  • Upload date:
  • Size: 107.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sgffp-0.13.0.tar.gz
Algorithm Hash digest
SHA256 264c43157d8f336bb016d398db2b6678c9b3c0df509bfa9f5ae9fd4035ab7694
MD5 c20d69255e4df642cdcfd5f0bcd6062d
BLAKE2b-256 0b1ae5e646f2239bfc3f120b0fa45c43ac69ee7c5420df8cfe6de646de0c2b20

See more details on using hashes here.

Provenance

The following attestation bundles were made for sgffp-0.13.0.tar.gz:

Publisher: publish.yml on merv1n34k/sgffp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sgffp-0.13.0-py3-none-any.whl.

File metadata

  • Download URL: sgffp-0.13.0-py3-none-any.whl
  • Upload date:
  • Size: 33.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sgffp-0.13.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6874e06bd5c1f9c37ec55f8038c1a02d7daab82227c4a02eb9115938c3fbc617
MD5 c8c9e7bc8b3278aec08ff2f5778290f6
BLAKE2b-256 4ffe76cb715420ea00fa59c04cb0e10e53283704d6d2a77dd359608e933e1960

See more details on using hashes here.

Provenance

The following attestation bundles were made for sgffp-0.13.0-py3-none-any.whl:

Publisher: publish.yml on merv1n34k/sgffp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page