Skip to main content

Read, write, and manipulate SnapGene files (.dna, .rna, .prot)

Project description

SnapGene File Format Parser

SnapGene File Format Parser (SGFFP for short) is a reverse-engineered parser for SnapGene DNA, RNA, and protein file formats.

[!Important] Hey! I have tried to decode as many different SnapGene blocks as I can, but surely something must be missing. This is why I ask you to check your SnapGene file(s) with uv run sff check <your_snapgene_file> to see which blocks your file has. If you have a new, unknown block type it will notify you with [NEW] flag Please open an issue and, if possible, either attach your file or dump the output of the block with the --examine/-e flag, i.e. uv run sff check <your_snapgene_file> -e 1> block.dump. Let's make parsing SnapGene files better together!

The parser reads SnapGene files into Python objects and exports to JSON, with a writer for creating new SnapGene files.

The project aims to be a minimalistic, fast, and useful tool for molecular biologists who need to parse large libraries of SnapGene files, or for developers building SnapGene-compatible applications.

Architecture

flowchart LR
    subgraph Input
        DNA[".dna file"]
        Bytes["bytes/stream"]
    end

    subgraph SGFFP
        Reader["SgffReader"]
        Object["SgffObject"]
        Writer["SgffWriter"]
    end

    subgraph Output
        JSON["JSON"]
        File[".dna file"]
    end

    DNA --> Reader
    Bytes --> Reader
    Reader --> Object
    Object --> Writer
    Object --> JSON
    Writer --> File

Installation

pip install sgffp

Or with uv:

uv add sgffp

For development:

git clone https://github.com/merv1n34k/sgffp.git
cd sgffp
uv sync --all-extras

Quick Start

from sgffp import SgffReader, SgffWriter

# Read a SnapGene file
sgff = SgffReader.from_file("plasmid.dna")

# Access data via typed properties
print(sgff.sequence.value)
print(sgff.features[0].name)

# Modify and write back
sgff.sequence.topology = "circular"
SgffWriter.to_file(sgff, "output.dna")

CLI Tool

uv run sff check plasmid.dna    # Inspect file blocks
uv run sff parse plasmid.dna    # Export to JSON
uv run sff info plasmid.dna     # Show file information

File Format

SnapGene uses a Type-Length-Value (TLV) binary format where each block contains:

Field Size Description
Type 1 byte Block type identifier
Length 4 bytes Payload size (big-endian)
Data N bytes Block payload

Data encoding varies by block type: UTF-8 for sequences, XML for annotations, 2-bit encoding for compressed DNA (GATC → 00/01/10/11), and LZMA compression for history blocks.

Block Types

All known SnapGene block types and their encoding formats:

ID Block Type Format ID Block Type Format
0 DNA Sequence UTF-8 17 Alignable Sequences XML
1 Compressed DNA 2-bit GATC 18 Sequence Trace ZTR
5 Primers XML 21 Protein Sequence UTF-8
6 Notes XML 28 Enzyme Visualization XML
7 History Tree LZMA + XML 29 History Modifier LZMA + XML
8 Sequence Properties XML 30 History Content LZMA + TLV
10 Features XML 32 RNA Sequence UTF-8
11 History Nodes Binary + TLV 14 Custom Enzymes XML

Blocks not listed (2-4, 9, 12-13, 15-16, 19-20, 22-27, 31) are either unknown or internal SnapGene data.

Supported Block Types

The table below shows which block types can be read from and written to SnapGene files. Blocks marked with a Model have typed Python classes for convenient access (e.g., sgff.sequence, sgff.features, sgff.history).

ID Block Type Read Write Model
0 DNA Sequence + + +
1 Compressed DNA + + +
5 Primers (XML) + + +
6 Notes (XML) + + +
7 History Tree (XML) + + +
8 Sequence Properties (XML) + + +
10 Features (XML) + + +
11 History Nodes + + +
14 Custom Enzymes (XML) + + -
17 Alignable Sequences (XML) + + +
21 Protein Sequence + + +
28 Enzyme Visualization (XML) + + -
29 History Modifier (XML) + + +
30 History Content (Nested) + + +
32 RNA Sequence + + +

Roadmap

  • Improve SGFF parsing, unify TLV strategy
  • Understand whole file structure
  • Correctly parse into readable format from all common blocks
  • Create writer for supported block types
  • Add comprehensive test suite (199 tests)
  • Parse XML into pure JSON format
  • Add write support for history blocks (LZMA compression)
  • Add typed model classes for easy data access
  • Documentation improvements

Acknowledgments

This project would not have been possible without previous work done by

License

Distributed under MIT licence, see LICENSE for more.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sgffp-0.10.0.tar.gz (92.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sgffp-0.10.0-py3-none-any.whl (25.3 kB view details)

Uploaded Python 3

File details

Details for the file sgffp-0.10.0.tar.gz.

File metadata

  • Download URL: sgffp-0.10.0.tar.gz
  • Upload date:
  • Size: 92.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sgffp-0.10.0.tar.gz
Algorithm Hash digest
SHA256 192cd75f00364f183193bb04bd19c09d482cf04c7f4574ad67957e6d6e3c29d9
MD5 0e4ceba2effbd7a554cbe661cde9d86a
BLAKE2b-256 853c5c99dcdd60cd3460477264b2b49e15d95c186a31ee0ae2ebd51a14e5e18f

See more details on using hashes here.

Provenance

The following attestation bundles were made for sgffp-0.10.0.tar.gz:

Publisher: publish.yml on merv1n34k/sgffp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sgffp-0.10.0-py3-none-any.whl.

File metadata

  • Download URL: sgffp-0.10.0-py3-none-any.whl
  • Upload date:
  • Size: 25.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sgffp-0.10.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5625fac63be0633ae8ac3058fdadfdd769de5d8be57047ad17587fa3b5991890
MD5 a2af2d50512e8c1e8557d296a52fa70b
BLAKE2b-256 327e029b83bc947914d20c2c4197fd8f2e0de2149f52c13a5102bab31ae3128e

See more details on using hashes here.

Provenance

The following attestation bundles were made for sgffp-0.10.0-py3-none-any.whl:

Publisher: publish.yml on merv1n34k/sgffp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page