Read, write, and manipulate SnapGene files (.dna, .rna, .prot)
Project description
SnapGene File Format Parser
A reverse-engineered parser and writer for SnapGene .dna files (DNA, RNA, protein). Supports all 17 known block types with typed Python models, a chainable builder pattern, and a history operations API.
[!Important] Found an unknown block type? Run
sff check your_file.dna -l— blocks marked[NEW]are genuinely unknown,[*]are known but undecoded. Please report[NEW]blocks in #1 with a dump (sff check your_file.dna -d).
Installation
pip install sgffp
Requires Python 3.10+.
Quick Start
from sgffp import SgffReader, SgffWriter, SgffObject
# Read a SnapGene file
sgff = SgffReader.from_file("plasmid.dna")
# Access data via typed properties
print(sgff.sequence.value)
print(sgff.features[0].name)
# Modify and write back
sgff.sequence.topology = "circular"
SgffWriter.to_file(sgff, "output.dna")
# Create a new file from scratch
sgff = (
SgffObject.new("ATGCATGCATGC", topology="circular")
.add_feature("GFP", "CDS", 0, 8)
.add_primer("fwd", "ATGC", bind_position=0)
)
SgffWriter.to_file(sgff, "new_plasmid.dna")
History Operations
Record cloning operations with automatic history tracking:
sgff.ops.insert_fragment("ATCGATCG")
sgff.ops.digest("GGCC", InputSummary={"manipulation": "digest"})
# Or build an entire tree from multiple source files
vector = SgffReader.from_file("vector.dna")
insert = SgffReader.from_file("insert.dna")
sgff.ops.build_from_spec(
[
{"id": 1, "operation": "insertFragment", "sequence": "...",
"name": "Final", "children": [2, 3]},
{"id": 2, "source": vector},
{"id": 3, "source": insert},
],
final_sequence="...",
)
How It Works
SnapGene files use a TLV (Type-Length-Value) binary format after a 19-byte header. Each block has a 1-byte type ID and a 4-byte length, with encoding varying by type: UTF-8 for sequences, XML for annotations, 2-bit GATC encoding for compressed DNA, LZMA for history, and ZTR for chromatogram traces.
SgffReader parses blocks via the SCHEME dispatch table and stores them in SgffObject.blocks (a Dict[int, List]). Typed model properties (sgff.sequence, sgff.features, sgff.history, etc.) are lazily loaded from the blocks dict and sync changes back automatically. SgffWriter serializes blocks back to binary in sorted order.
Supported Block Types
| ID | Block Type | Format | Model |
|---|---|---|---|
| 0 | DNA Sequence | UTF-8 | SgffSequence |
| 1 | Compressed DNA | 2-bit GATC | SgffSequence |
| 5 | Primers | XML | SgffPrimerList |
| 6 | Notes | XML | SgffNotes |
| 7 | History Tree | LZMA + XML | SgffHistory |
| 8 | Sequence Properties | XML | SgffProperties |
| 10 | Features | XML | SgffFeatureList |
| 11 | History Nodes | Binary + TLV | SgffHistory |
| 14 | Custom Enzyme Sets | XML | |
| 16 | Trace Container | Binary + TLV | SgffTraceList |
| 17 | Alignable Sequences | XML | SgffAlignmentList |
| 18 | ZTR Trace (in 16) | ZTR | SgffTrace |
| 20 | Strand Colors | XML | |
| 21 | Protein Sequence | UTF-8 | SgffSequence |
| 23 | File Attachments | Binary + zlib XML | SgffAttachmentList |
| 27 | Trace Alignment | BGZF + BAM | SgffTraceAlignment |
| 28 | Enzyme Visibilities | XML | |
| 29 | History Modifier | LZMA + XML | SgffHistory |
| 30 | History Content | LZMA + TLV | SgffHistory |
| 32 | RNA Sequence | UTF-8 | SgffSequence |
| 34 | RNA Structure | LZMA + JSON |
Blocks 2, 3, 13, 35 are auto-generated by SnapGene and intentionally skipped.
CLI
sff parse plasmid.dna # Export to JSON
sff info plasmid.dna -v # Show detailed file info
sff tree plasmid.dna # Display history timeline
sff check plasmid.dna -l # List block types
sff filter plasmid.dna -k 0,10 -o minimal.dna
All read commands accept stdin (cat file.dna | sff info).
Development
git clone https://github.com/merv1n34k/sgffp.git
cd sgffp
uv sync --dev
# Run tests
uv run pytest tests/ -v
# Docs (VitePress)
cd docs && bun install && bun run docs:dev
Documentation
Full guides, API reference, CLI reference, and binary format specification:
Acknowledgments
This project would not have been possible without previous work done by
- Damien Goutte-Gattat, see his PDF on SGFF structure: https://incenp.org/dvlpt/docs/binary-sequence-formats/binary-sequence-formats.pdf
- Isaac Luo, for his version of SnapGene reader: https://github.com/IsaacLuo/SnapGeneFileReader
- Kale Kundert, for autosnapgene, a SnapGene automation tool: https://github.com/kalekundert/autosnapgene
Contributions
Also would like to say thank for the people who helped the project:
- Manuel Lera-Ramirez (@manulera) for his PRs and suggestions
- Cory Tobin (@cory-mozza) for reviewing new blocks
License
Distributed under MIT licence, see LICENSE for more.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sgffp-0.17.1.tar.gz.
File metadata
- Download URL: sgffp-0.17.1.tar.gz
- Upload date:
- Size: 223.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
912e7234bdfa0100e89ac5123f6bc9a6c1f0003c03e0a9ebb294a6c56072fedf
|
|
| MD5 |
eb2c3e55e21365d3522fb88a24a10571
|
|
| BLAKE2b-256 |
5a8bdf05991c5a08cc3d77ead8c89b0de9a099fc6449cc7d060fa895833623c5
|
Provenance
The following attestation bundles were made for sgffp-0.17.1.tar.gz:
Publisher:
publish.yml on merv1n34k/sgffp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sgffp-0.17.1.tar.gz -
Subject digest:
912e7234bdfa0100e89ac5123f6bc9a6c1f0003c03e0a9ebb294a6c56072fedf - Sigstore transparency entry: 1591189361
- Sigstore integration time:
-
Permalink:
merv1n34k/sgffp@261d348f4c260e0c32321e9bd9282f8a3986fa50 -
Branch / Tag:
refs/tags/v0.17.1 - Owner: https://github.com/merv1n34k
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@261d348f4c260e0c32321e9bd9282f8a3986fa50 -
Trigger Event:
release
-
Statement type:
File details
Details for the file sgffp-0.17.1-py3-none-any.whl.
File metadata
- Download URL: sgffp-0.17.1-py3-none-any.whl
- Upload date:
- Size: 51.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d866bf043488c67d2bfb8aacacdf316f0a0ee9b9a1958a50125922fb735a2b9e
|
|
| MD5 |
99965bb572d617c2c18dc4883d4d07c9
|
|
| BLAKE2b-256 |
d6a69eeab4dd07025665817d2ac4e2bd1e950a3a00e590ae1af08609e43039f3
|
Provenance
The following attestation bundles were made for sgffp-0.17.1-py3-none-any.whl:
Publisher:
publish.yml on merv1n34k/sgffp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sgffp-0.17.1-py3-none-any.whl -
Subject digest:
d866bf043488c67d2bfb8aacacdf316f0a0ee9b9a1958a50125922fb735a2b9e - Sigstore transparency entry: 1591189391
- Sigstore integration time:
-
Permalink:
merv1n34k/sgffp@261d348f4c260e0c32321e9bd9282f8a3986fa50 -
Branch / Tag:
refs/tags/v0.17.1 - Owner: https://github.com/merv1n34k
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@261d348f4c260e0c32321e9bd9282f8a3986fa50 -
Trigger Event:
release
-
Statement type: