Deterministic, atomic, save-only-if-modified file writing for Python data pipelines.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

zype77

These details have not been verified by PyPI

Project description

📝 stablewrite

Deterministic, atomic, save-only-if-modified file writing for Python data pipelines.

stablewrite is for scripts that generate files repeatedly but should only touch the output when the underlying data actually changed.

If you use Snakemake, Make, Docker volumes, CI caches, notebooks, or report pipelines, you have probably seen this: a script re-runs, writes the same data again, updates the file modification time, and suddenly half the downstream workflow rebuilds for no real reason.

stablewrite fixes that by writing into an isolated temporary directory first, normalizing volatile metadata, comparing the result with the existing destination, and publishing only when the finalized output is meaningfully different.

from stable_write import save_if_changed

with save_if_changed("output/report.csv") as saver:
    saver.path.write_text("id,value\n1,100\n", encoding="utf-8")

print(saver)  # saved or skipped, with hashes and reason available on the object

If the generated bytes match the existing file, the destination is left untouched. Its mtime stays exactly as it was.

✨ Features

Save only if changed: unchanged outputs are discarded, preserving destination mtime and avoiding unnecessary downstream rebuilds.
Atomic publish step: files are staged away from the destination, then copied to a destination-side temp file and published with os.replace.
Deterministic ZIP/OOXML profiles: built-in profiles for .zip, .xlsx, .docx, and .pptx normalize ZIP metadata and strip volatile OOXML core properties.
Companion file support: publish bundles such as ESRI Shapefiles (.shp, .dbf, .shx, .prj, .cpg) together with the main file.
Strict explicit companions: if you request companions=["foo.csv"], that file must be created, otherwise the save fails without publishing anything.
Semantic comparison hook: use is_equal= for formats where byte stability is unrealistic but structural equality is easy to check.
Zero core dependencies: the built-in profiles use only the Python standard library. Writer libraries such as pandas, openpyxl, and GeoPandas are only needed by your own code.
Large ZIP friendly: ZIP entries are streamed during normalization, so embedded media in .pptx or .docx files do not need to be loaded fully into memory.

📦 Installation

Install the core package:

pip install stablewrite

The core library has no runtime dependencies. Install the writer libraries you use in your own pipeline:

pip install pandas openpyxl      # if you generate Excel files
pip install geopandas            # if you write shapefiles or GeoPackages

The built-in xlsx, docx, and pptx profiles do not require openpyxl; they patch the OOXML ZIP structure directly with the standard library.

🚀 Quickstart

Basic Usage

Write to saver.path, not directly to the final destination. After the with block exits, stablewrite decides whether to publish.

from stable_write import save_if_changed

with save_if_changed("output/report.csv") as saver:
    saver.path.write_text("id,value\n1,100\n", encoding="utf-8")

if saver.saved:
    print(f"Updated {saver.destination} ({saver.new_hash})")
else:
    print(f"Skipped: {saver.reason}")

The Excel Timestamp Problem

pandas.DataFrame.to_excel() writes an OOXML workbook. The workbook can include dynamic metadata such as dcterms:modified, so two identical DataFrames saved one second apart can produce different file hashes.

Use the xlsx profile, or the convenience wrapper, to normalize the workbook before comparison:

import pandas as pd
from stable_write import save_xlsx_if_changed

df = pd.DataFrame({"A": [1, 2], "B": [3, 4]})

with save_xlsx_if_changed("results/data.xlsx") as saver:
    df.to_excel(saver.path, index=False)

if saver.saved:
    print("Excel report changed")

Under the hood the xlsx profile patches docProps/core.xml and then rewrites the ZIP container with deterministic entry ordering, timestamps, and extra fields.

Companion Files

Some formats are bundles, not single files. ESRI Shapefiles are the classic example: writing spatial.shp usually also creates spatial.shx, spatial.dbf, spatial.prj, and spatial.cpg.

Use companions="auto" when the writer decides which companion files exist:

import geopandas as gpd
from stable_write import save_if_changed

gdf = gpd.read_file("raw_data.geojson")

with save_if_changed("processed/spatial.shp", companions="auto") as saver:
    gdf.to_file(saver.path)

if "spatial.dbf" in saver.changed_companions:
    print("Attribute table changed")

If any companion changes, the save is treated as changed and the bundle is published. Each file is replaced atomically on its own; the bundle as a whole is not transactional across multiple files.

Use an explicit list when every companion is required:

with save_if_changed(
    "processed/spatial.shp",
    companions=["spatial.shx", "spatial.dbf", "spatial.prj"],
) as saver:
    gdf.to_file(saver.path)

If one of those listed files is missing from the temporary directory, stablewrite raises FileNotFoundError and leaves the destination untouched. That makes explicit companions a contract, while companions="auto" remains the optional/discovery mode.

Custom Semantic Comparison

Some formats are not realistically byte-stable. SQLite-based formats such as GeoPackage (.gpkg) may include internal metadata, page ordering, or timestamps that make byte hashes noisy.

For those cases, provide is_equal=. The callable receives the newly generated temp file and the existing destination and returns whether they are equivalent.

from pathlib import Path

import geopandas as gpd
from stable_write import save_if_changed


def gpkg_is_equal(new: Path, existing: Path) -> bool:
    """Compare GeoPackages by data content, not raw bytes."""
    new_data = gpd.read_file(new)
    old_data = gpd.read_file(existing)
    return new_data.equals(old_data)


with save_if_changed("data/roads.gpkg", is_equal=gpkg_is_equal) as saver:
    gdf.to_file(saver.path, driver="GPKG")

old_hash and new_hash are still computed and stored. is_equal only replaces the equality decision for the main file. Companion files are still compared by hash.

⚙️ API Overview

`save_if_changed(...)`

save_if_changed(
    path,
    *,
    profile=None,
    finalizers=None,
    save_strategy="overwrite",
    algo="blake2b",
    safe_copy=False,
    companions="auto",
    is_equal=None,
)

Argument	Purpose
`path`	Final destination path.
`profile`	Named profile: `"zip"`, `"xlsx"`, `"docx"`, `"pptx"`, or any registered custom profile.
`finalizers`	Ordered list of custom `(Path) -> None` functions run before hashing. Overrides `profile`.
`save_strategy`	What to do when content changed: `"overwrite"`, `"raise"`, or `"skip"`.
`algo`	Hash algorithm used for byte comparison. Defaults to `"blake2b"`.
`safe_copy`	Use `shutil.copyfile` instead of `shutil.copy2` for the publish copy.
`companions`	`"auto"`, `None`, `[]`, or an explicit list of companion filenames.
`is_equal`	Optional semantic comparator for the main file.

Registry

Profiles are stored in a global registry. The following functions manage it:

Function	Purpose
`register_profile(name, finalizers, is_equal, force)`	Register a named profile for use with `profile=`.
`get_profile(name) → Profile`	Retrieve a registered profile; raises `ValueError` if absent.
`list_profiles() → list[str]`	Return a sorted list of all registered profile names.

All three are importable directly from stable_write.

Built-In Profiles

Profile	Finalizers	Use case
`zip`	`normalize_zip_metadata`	Generic ZIP archives with volatile entry metadata.
`xlsx`	`strip_ooxml_metadata`, `normalize_zip_metadata`	Generated Excel workbooks, including pandas/openpyxl output.
`docx`	`strip_ooxml_metadata`, `normalize_zip_metadata`	Generated Word documents.
`pptx`	`strip_ooxml_metadata`, `normalize_zip_metadata`	Generated PowerPoint files, including files with large embedded media.

Result Object

Inside the context manager you receive a Saver. After the context exits, it exposes:

Attribute	Meaning
`saver.path`	Temporary path you should write to inside the `with` block.
`saver.destination`	Final destination path.
`saver.saved`	`True` if the destination was replaced.
`saver.changed`	`True` if the new output differed from the existing output.
`saver.reason`	Human-readable decision reason.
`saver.old_hash`	Hash of the existing destination, or `None` when missing.
`saver.new_hash`	Hash of the finalized temp file.
`saver.changed_companions`	Companion filenames whose bytes changed or appeared.

🧭 Save Strategies

Use save_strategy to control what happens when content changed:

"overwrite" (default): publish the new output.
"raise": raise FileExistsError and leave the destination untouched.
"skip": do not publish, but populate changed, reason, and hashes on the saver.

"raise" is useful for strict notebook evaluation or audit workflows where a rerun must never mutate canonical outputs silently.

🔌 Custom Profiles

You can package reusable finalizer chains as named profiles. A registered profile can be selected with profile= anywhere you call save_if_changed, including in third-party libraries built on top of stablewrite.

from pathlib import Path

from stable_write import register_profile, save_if_changed
from stable_write.finalizers import normalize_zip_metadata


def strip_my_app_header(path: Path) -> None:
    """Remove the generated-on comment from app-specific text exports."""
    lines = path.read_text(encoding="utf-8").splitlines()
    cleaned = [l for l in lines if not l.startswith("# Generated on")]
    path.write_text("\n".join(cleaned) + "\n", encoding="utf-8")


register_profile("my_zip", finalizers=[strip_my_app_header, normalize_zip_metadata])

with save_if_changed("output/bundle.zip", profile="my_zip") as saver:
    build_bundle(saver.path)

You can also attach a default is_equal comparator to a profile. When save_if_changed resolves the profile, is_equal is used automatically unless the caller provides their own.

register_profile("gpkg", is_equal=gpkg_is_equal)

Use force=True to replace an existing registration (for example, when testing or when upgrading a profile at startup).

🧹 Custom Finalizers

Finalizers are small functions that mutate the staged temporary file before hashing. They are the right tool when you want the file on disk to be canonical.

Common uses:

Remove generated headers such as # Generated on 2026-05-28 from text exports.
Re-serialize JSON/YAML with sorted keys and stable indentation.
Strip image metadata from generated plots.
Remove absolute local paths from generated reports.

Example: canonical JSON output.

import json
from pathlib import Path

from stable_write import save_if_changed


def canonical_json(path: Path) -> None:
    data = json.loads(path.read_text(encoding="utf-8"))
    path.write_text(
        json.dumps(data, sort_keys=True, indent=2, ensure_ascii=False) + "\n",
        encoding="utf-8",
    )


with save_if_changed("config.json", finalizers=[canonical_json]) as saver:
    some_library.write_json(saver.path)

If the finalizer raises, nothing is published. The existing destination stays untouched.

🤔 Finalizers vs. `is_equal`

Both features help with formats that produce noisy bytes. They solve different problems.

Use a finalizer when you want to fix the generated file before it lands on disk:

the stored file should have stable formatting;
downstream tools rely on byte-level stability;
Git diffs should be clean;
hashes should represent the normalized artifact.

Use is_equal when you only need a smarter comparison:

the file format is hard to rewrite safely;
semantic equality is easy to compute in Python;
you want to ignore fields during comparison without altering newly saved files;
you need tolerance-based comparison, such as approximate floats.

Example: compare JSON semantically while ignoring a volatile nested key.

import json
from pathlib import Path

from stable_write import save_if_changed


def json_equal_ignoring_timestamp(new_path: Path, existing_path: Path) -> bool:
    new_data = json.loads(new_path.read_text(encoding="utf-8"))
    old_data = json.loads(existing_path.read_text(encoding="utf-8"))

    new_data.get("metadata", {}).pop("generated_at", None)
    old_data.get("metadata", {}).pop("generated_at", None)

    return new_data == old_data


with save_if_changed("config.json", is_equal=json_equal_ignoring_timestamp) as saver:
    some_library.write_json(saver.path)

If is_equal returns False, the raw generated temp file is published. If you also want to clean the file before publication, use a finalizer as well.

Scenario	Prefer finalizer	Prefer `is_equal`
Stable JSON key order on disk	Yes	Maybe not necessary
Ignore a nested timestamp only for comparison	Possible, but changes stored file	Yes
Clean Git diffs	Yes	No
Approximate float comparison	No	Yes
Non-Python downstream byte cache	Yes	No
Expensive or risky binary rewrite	No	Yes

🧱 Guarantees and Boundaries

stablewrite is intentionally conservative:

Finalizers run before hashing, so profiles can make noisy output deterministic.
Finalizer failures leave the destination untouched.
The final publish uses destination-side temporary files and os.replace.
For companion bundles, each file is replaced atomically, but the bundle is not a transaction.
is_equal affects only the main file; companions are still tracked by hash.
Explicit companion lists are strict. Use companions="auto" when companion files are optional.

🧪 Why This Matters

A plain write updates mtime even when the content is identical:

Path("report.csv").write_text(render_report())

That is enough to wake up downstream jobs in Make, Snakemake, Docker layer caches, or CI artifacts.

stablewrite makes the write conditional on the finalized artifact:

with save_if_changed("report.csv") as saver:
    saver.path.write_text(render_report(), encoding="utf-8")

Same data means no replacement, no new mtime, and no accidental rebuild.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

zype77

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.3

May 29, 2026

0.1.2

May 29, 2026

0.1.1

May 28, 2026

0.1.0

May 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stable_write-0.1.3.tar.gz (15.9 kB view details)

Uploaded May 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

stable_write-0.1.3-py3-none-any.whl (17.6 kB view details)

Uploaded May 29, 2026 Python 3

File details

Details for the file stable_write-0.1.3.tar.gz.

File metadata

Download URL: stable_write-0.1.3.tar.gz
Upload date: May 29, 2026
Size: 15.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for stable_write-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`66f7b9e5b6e3088786c4351fdb6a2e37ad19349a806d52c19b3dc8acfe52e786`
MD5	`5b887ff622c1bff5cf9d4a8531f55615`
BLAKE2b-256	`2713f3b706e9a53965bcbf4e56c8b7721ac515f8a7a05e4e7c8a6b45c2ae5af9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for stable_write-0.1.3.tar.gz:

Publisher: publish.yml on ews-ffarella/stablewrite

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: stable_write-0.1.3.tar.gz
- Subject digest: 66f7b9e5b6e3088786c4351fdb6a2e37ad19349a806d52c19b3dc8acfe52e786
- Sigstore transparency entry: 1666469181
- Sigstore integration time: May 29, 2026
Source repository:
- Permalink: ews-ffarella/stablewrite@215cd1c64c26461a44c1f5aa315b78347baeb45f
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/ews-ffarella
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@215cd1c64c26461a44c1f5aa315b78347baeb45f
- Trigger Event: release

File details

Details for the file stable_write-0.1.3-py3-none-any.whl.

File metadata

Download URL: stable_write-0.1.3-py3-none-any.whl
Upload date: May 29, 2026
Size: 17.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for stable_write-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dff282c8207ac454e4cb5016365ee5c317c3cd3a9916699a98625441f7dfd197`
MD5	`e0091758042597426cae360c7361854c`
BLAKE2b-256	`e7dd59f72c20769e429d52c46fe47c20f4811198e9015e6469ed96a2d862f8e7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for stable_write-0.1.3-py3-none-any.whl:

Publisher: publish.yml on ews-ffarella/stablewrite

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: stable_write-0.1.3-py3-none-any.whl
- Subject digest: dff282c8207ac454e4cb5016365ee5c317c3cd3a9916699a98625441f7dfd197
- Sigstore transparency entry: 1666469269
- Sigstore integration time: May 29, 2026
Source repository:
- Permalink: ews-ffarella/stablewrite@215cd1c64c26461a44c1f5aa315b78347baeb45f
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/ews-ffarella
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@215cd1c64c26461a44c1f5aa315b78347baeb45f
- Trigger Event: release

stable-write 0.1.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

📝 stablewrite

✨ Features

📦 Installation

🚀 Quickstart

Basic Usage

The Excel Timestamp Problem

Companion Files

Custom Semantic Comparison

⚙️ API Overview

save_if_changed(...)

Registry

Built-In Profiles

Result Object

🧭 Save Strategies

🔌 Custom Profiles

🧹 Custom Finalizers

🤔 Finalizers vs. is_equal

🧱 Guarantees and Boundaries

🧪 Why This Matters

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`save_if_changed(...)`

🤔 Finalizers vs. `is_equal`