Skip to main content

Read, write, and manipulate SnapGene files (.dna, .rna, .prot)

Project description

SnapGene File Format Parser

A reverse-engineered parser and writer for SnapGene .dna files (DNA, RNA, protein). Supports all 17 known block types with typed Python models, a chainable builder pattern, and a history operations API.

[!Important] Found an unknown block type? Run sff check your_file.dna -l — blocks marked [NEW] are genuinely unknown, [*] are known but undecoded. Please report [NEW] blocks in #1 with a dump (sff check your_file.dna -d).

Installation

pip install sgffp

Requires Python 3.10+.

Quick Start

from sgffp import SgffReader, SgffWriter, SgffObject

# Read a SnapGene file
sgff = SgffReader.from_file("plasmid.dna")

# Access data via typed properties
print(sgff.sequence.value)
print(sgff.features[0].name)

# Modify and write back
sgff.sequence.topology = "circular"
SgffWriter.to_file(sgff, "output.dna")

# Create a new file from scratch
sgff = (
    SgffObject.new("ATGCATGCATGC", topology="circular")
    .add_feature("GFP", "CDS", 0, 8)
    .add_primer("fwd", "ATGC", bind_position=0)
)
SgffWriter.to_file(sgff, "new_plasmid.dna")

History Operations

Record cloning operations with automatic history tracking:

sgff.ops.insert_fragment("ATCGATCG")
sgff.ops.digest("GGCC", InputSummary={"manipulation": "digest"})

# Or build an entire tree from multiple source files
vector = SgffReader.from_file("vector.dna")
insert = SgffReader.from_file("insert.dna")

sgff.ops.build_from_spec(
    [
        {"id": 1, "operation": "insertFragment", "sequence": "...",
         "name": "Final", "children": [2, 3]},
        {"id": 2, "source": vector},
        {"id": 3, "source": insert},
    ],
    final_sequence="...",
)

How It Works

SnapGene files use a TLV (Type-Length-Value) binary format after a 19-byte header. Each block has a 1-byte type ID and a 4-byte length, with encoding varying by type: UTF-8 for sequences, XML for annotations, 2-bit GATC encoding for compressed DNA, LZMA for history, and ZTR for chromatogram traces.

SgffReader parses blocks via the SCHEME dispatch table and stores them in SgffObject.blocks (a Dict[int, List]). Typed model properties (sgff.sequence, sgff.features, sgff.history, etc.) are lazily loaded from the blocks dict and sync changes back automatically. SgffWriter serializes blocks back to binary in sorted order.

Supported Block Types

ID Block Type Format Model
0 DNA Sequence UTF-8 SgffSequence
1 Compressed DNA 2-bit GATC SgffSequence
5 Primers XML SgffPrimerList
6 Notes XML SgffNotes
7 History Tree LZMA + XML SgffHistory
8 Sequence Properties XML SgffProperties
10 Features XML SgffFeatureList
11 History Nodes Binary + TLV SgffHistory
14 Custom Enzyme Sets XML
16 Trace Container Binary + TLV SgffTraceList
17 Alignable Sequences XML SgffAlignmentList
18 ZTR Trace (in 16) ZTR SgffTrace
20 Strand Colors XML
21 Protein Sequence UTF-8 SgffSequence
23 File Attachments Binary + zlib XML SgffAttachmentList
27 Trace Alignment BGZF + BAM SgffTraceAlignment
28 Enzyme Visibilities XML
29 History Modifier LZMA + XML SgffHistory
30 History Content LZMA + TLV SgffHistory
32 RNA Sequence UTF-8 SgffSequence
34 RNA Structure LZMA + JSON

Blocks 2, 3, 13, 35 are auto-generated by SnapGene and intentionally skipped.

CLI

sff parse plasmid.dna           # Export to JSON
sff info plasmid.dna -v         # Show detailed file info
sff tree plasmid.dna            # Display history timeline
sff check plasmid.dna -l        # List block types
sff filter plasmid.dna -k 0,10 -o minimal.dna

All read commands accept stdin (cat file.dna | sff info).

Development

git clone https://github.com/merv1n34k/sgffp.git
cd sgffp
uv sync --dev

# Run tests
uv run pytest tests/ -v

# Docs (VitePress)
cd docs && bun install && bun run docs:dev

Documentation

Full guides, API reference, CLI reference, and binary format specification:

merv1n34k.github.io/sgffp

Acknowledgments

This project would not have been possible without previous work done by

Contributions

Also would like to say thank for the people who helped the project:

  • Manuel Lera-Ramirez (@manulera) for his PRs and suggestions
  • Cory Tobin (@cory-mozza) for reviewing new blocks

License

Distributed under MIT licence, see LICENSE for more.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sgffp-0.17.1.tar.gz (223.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sgffp-0.17.1-py3-none-any.whl (51.9 kB view details)

Uploaded Python 3

File details

Details for the file sgffp-0.17.1.tar.gz.

File metadata

  • Download URL: sgffp-0.17.1.tar.gz
  • Upload date:
  • Size: 223.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sgffp-0.17.1.tar.gz
Algorithm Hash digest
SHA256 912e7234bdfa0100e89ac5123f6bc9a6c1f0003c03e0a9ebb294a6c56072fedf
MD5 eb2c3e55e21365d3522fb88a24a10571
BLAKE2b-256 5a8bdf05991c5a08cc3d77ead8c89b0de9a099fc6449cc7d060fa895833623c5

See more details on using hashes here.

Provenance

The following attestation bundles were made for sgffp-0.17.1.tar.gz:

Publisher: publish.yml on merv1n34k/sgffp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sgffp-0.17.1-py3-none-any.whl.

File metadata

  • Download URL: sgffp-0.17.1-py3-none-any.whl
  • Upload date:
  • Size: 51.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sgffp-0.17.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d866bf043488c67d2bfb8aacacdf316f0a0ee9b9a1958a50125922fb735a2b9e
MD5 99965bb572d617c2c18dc4883d4d07c9
BLAKE2b-256 d6a69eeab4dd07025665817d2ac4e2bd1e950a3a00e590ae1af08609e43039f3

See more details on using hashes here.

Provenance

The following attestation bundles were made for sgffp-0.17.1-py3-none-any.whl:

Publisher: publish.yml on merv1n34k/sgffp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page