Read, write, and manipulate SnapGene files (.dna, .rna, .prot)
Project description
SnapGene File Format Parser
SnapGene File Format Parser (SGFFP for short) is a reverse-engineered parser for SnapGene DNA, RNA, and protein file formats.
[!Important] Found an unknown block type? Run
sff check your_file.dna -land look for[NEW]markers. Please report them in #1 with a dump (sff check your_file.dna -d). Help us decode more blocks!
The parser reads SnapGene files into Python objects and exports to JSON, with a writer for creating new SnapGene files.
The project aims to be a minimalistic, fast, and useful tool for molecular biologists who need to parse large libraries of SnapGene files, or for developers building SnapGene-compatible applications.
Architecture
flowchart LR
subgraph Input
DNA[".dna file"]
Bytes["bytes/stream"]
end
subgraph SGFFP
Reader["SgffReader"]
Object["SgffObject"]
Writer["SgffWriter"]
end
subgraph Output
JSON["JSON"]
File[".dna file"]
end
DNA --> Reader
Bytes --> Reader
Reader --> Object
Object --> Writer
Object --> JSON
Writer --> File
Installation
pip install sgffp
Or with uv:
uv add sgffp
For development:
git clone https://github.com/merv1n34k/sgffp.git
cd sgffp
uv sync --all-extras
Quick Start
from sgffp import SgffReader, SgffWriter
# Read a SnapGene file
sgff = SgffReader.from_file("plasmid.dna")
# Access data via typed properties
print(sgff.sequence.value)
print(sgff.features[0].name)
# Modify and write back
sgff.sequence.topology = "circular"
SgffWriter.to_file(sgff, "output.dna")
CLI Tool
uv run sff check plasmid.dna # Inspect file blocks
uv run sff parse plasmid.dna # Export to JSON
uv run sff info plasmid.dna # Show file information
File Format
SnapGene uses a Type-Length-Value (TLV) binary format where each block contains:
| Field | Size | Description |
|---|---|---|
| Type | 1 byte | Block type identifier |
| Length | 4 bytes | Payload size (big-endian) |
| Data | N bytes | Block payload |
Data encoding varies by block type: UTF-8 for sequences, XML for annotations, 2-bit encoding for compressed DNA (GATC → 00/01/10/11), and LZMA compression for history blocks.
Block Types
All known SnapGene block types and their encoding formats:
| ID | Block Type | Format | ID | Block Type | Format |
|---|---|---|---|---|---|
| 0 | DNA Sequence | UTF-8 | 17 | Alignable Sequences | XML |
| 1 | Compressed DNA | 2-bit GATC | 18 | Sequence Trace | ZTR |
| 5 | Primers | XML | 21 | Protein Sequence | UTF-8 |
| 6 | Notes | XML | 28 | Enzyme Visualization | XML |
| 7 | History Tree | LZMA + XML | 29 | History Modifier | LZMA + XML |
| 8 | Sequence Properties | XML | 30 | History Content | LZMA + TLV |
| 10 | Features | XML | 32 | RNA Sequence | UTF-8 |
| 11 | History Nodes | Binary + TLV | 14 | Custom Enzymes | XML |
Blocks not listed (2-4, 9, 12-13, 15-16, 19-20, 22-27, 31) are either unknown or internal SnapGene data.
Supported Block Types
The table below shows which block types can be read from and written to SnapGene files. Blocks marked with a Model have typed Python classes for convenient access (e.g., sgff.sequence, sgff.features, sgff.history).
| ID | Block Type | Read | Write | Model |
|---|---|---|---|---|
| 0 | DNA Sequence | + | + | + |
| 1 | Compressed DNA | + | + | + |
| 5 | Primers (XML) | + | + | + |
| 6 | Notes (XML) | + | + | + |
| 7 | History Tree (XML) | + | + | + |
| 8 | Sequence Properties (XML) | + | + | + |
| 10 | Features (XML) | + | + | + |
| 11 | History Nodes | + | + | + |
| 14 | Custom Enzymes (XML) | + | + | - |
| 17 | Alignable Sequences (XML) | + | + | + |
| 21 | Protein Sequence | + | + | + |
| 28 | Enzyme Visualization (XML) | + | + | - |
| 29 | History Modifier (XML) | + | + | + |
| 30 | History Content (Nested) | + | + | + |
| 32 | RNA Sequence | + | + | + |
Roadmap
- Improve SGFF parsing, unify TLV strategy
- Understand whole file structure
- Correctly parse into readable format from all common blocks
- Create writer for supported block types
- Add comprehensive test suite (199 tests)
- Parse XML into pure JSON format
- Add write support for history blocks (LZMA compression)
- Add typed model classes for easy data access
- Documentation improvements
Acknowledgments
This project would not have been possible without previous work done by
- Damien Goutte-Gattat, see his PDF on SGFF structure: https://incenp.org/dvlpt/docs/binary-sequence-formats/binary-sequence-formats.pdf
- Isaac Luo, for his version of SnapGene reader: https://github.com/IsaacLuo/SnapGeneFileReader
License
Distributed under MIT licence, see LICENSE for more.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sgffp-0.13.1.tar.gz.
File metadata
- Download URL: sgffp-0.13.1.tar.gz
- Upload date:
- Size: 107.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c05ce4f81de04d90e4f3de6cdcf6712d2eef33afb05b380025c377a0b1c5b9a0
|
|
| MD5 |
e332e5382a554a1b6aa82de7167bcf38
|
|
| BLAKE2b-256 |
16081cb17d3aff1bb1b02afb668f8fab9ab8ec824e301c32852104d9291b8c33
|
Provenance
The following attestation bundles were made for sgffp-0.13.1.tar.gz:
Publisher:
publish.yml on merv1n34k/sgffp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sgffp-0.13.1.tar.gz -
Subject digest:
c05ce4f81de04d90e4f3de6cdcf6712d2eef33afb05b380025c377a0b1c5b9a0 - Sigstore transparency entry: 898521010
- Sigstore integration time:
-
Permalink:
merv1n34k/sgffp@a1e05a9e4adf4f7f747bfaf6222e74545e62621d -
Branch / Tag:
refs/tags/v0.13.1 - Owner: https://github.com/merv1n34k
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a1e05a9e4adf4f7f747bfaf6222e74545e62621d -
Trigger Event:
release
-
Statement type:
File details
Details for the file sgffp-0.13.1-py3-none-any.whl.
File metadata
- Download URL: sgffp-0.13.1-py3-none-any.whl
- Upload date:
- Size: 33.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f05579d69bde040e71107ae4e675b96ee2fb00258ead48c14f20e738145730fb
|
|
| MD5 |
ed37dd68174233a09b2a975fb7323de8
|
|
| BLAKE2b-256 |
8b6d9265d4ff9292c74a34efa48761b52b8f0410c97171c9e47bee546b13ab6d
|
Provenance
The following attestation bundles were made for sgffp-0.13.1-py3-none-any.whl:
Publisher:
publish.yml on merv1n34k/sgffp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sgffp-0.13.1-py3-none-any.whl -
Subject digest:
f05579d69bde040e71107ae4e675b96ee2fb00258ead48c14f20e738145730fb - Sigstore transparency entry: 898521099
- Sigstore integration time:
-
Permalink:
merv1n34k/sgffp@a1e05a9e4adf4f7f747bfaf6222e74545e62621d -
Branch / Tag:
refs/tags/v0.13.1 - Owner: https://github.com/merv1n34k
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a1e05a9e4adf4f7f747bfaf6222e74545e62621d -
Trigger Event:
release
-
Statement type: