Skip to main content

Web page change tracking with structured diffs — markgrab + snapgrab integration, MCP native.

Project description

diffgrab

PyPI Python License

한국어 문서

Web page change tracking with structured diffs. markgrab + snapgrab integration, MCP native.

from diffgrab import DiffTracker

tracker = DiffTracker()
await tracker.track("https://example.com")
changes = await tracker.check()
for c in changes:
    if c.changed:
        print(c.summary)     # "3 lines added, 1 lines removed in sections: Introduction."
        print(c.unified_diff) # Standard unified diff output
await tracker.close()

Features

  • Change detection — track any URL, detect content changes via content hashing
  • Structured diffs — unified diff + section-level analysis (which headings changed)
  • Human-readable summaries — "5 lines added, 2 removed in sections: Intro, Methods"
  • Snapshot history — SQLite storage, browse past versions of any page
  • markgrab powered — HTML/YouTube/PDF/DOCX extraction via markgrab
  • Visual diff — optional screenshot comparison via snapgrab
  • MCP server — 5 tools for Claude Code / MCP clients
  • CLI includeddiffgrab track, check, diff, history, untrack

Install

pip install diffgrab

Optional extras:

pip install 'diffgrab[cli]'      # CLI with click + rich
pip install 'diffgrab[visual]'   # Visual diff with snapgrab
pip install 'diffgrab[mcp]'      # MCP server with fastmcp
pip install 'diffgrab[all]'      # Everything

Usage

Python API

import asyncio
from diffgrab import DiffTracker

async def main():
    tracker = DiffTracker()

    # Track a URL (takes initial snapshot)
    await tracker.track("https://example.com", interval_hours=12)

    # Check for changes
    changes = await tracker.check()
    for change in changes:
        if change.changed:
            print(change.summary)
            print(change.unified_diff)

    # Get diff between specific snapshots
    result = await tracker.diff("https://example.com", before_id=1, after_id=2)

    # Browse snapshot history
    history = await tracker.history("https://example.com", count=20)

    # Stop tracking
    await tracker.untrack("https://example.com")

    await tracker.close()

asyncio.run(main())

Convenience Functions

from diffgrab import track, check, diff, history, untrack

await track("https://example.com")
changes = await check()
result = await diff("https://example.com")
snaps = await history("https://example.com")
await untrack("https://example.com")

CLI

# Track a URL
diffgrab track https://example.com --interval 12

# Check all tracked URLs for changes
diffgrab check

# Check a specific URL
diffgrab check https://example.com

# Show diff between snapshots
diffgrab diff https://example.com
diffgrab diff https://example.com --before 1 --after 3

# View snapshot history
diffgrab history https://example.com --count 20

# Stop tracking
diffgrab untrack https://example.com

MCP Server

Add to your Claude Code MCP config:

{
  "mcpServers": {
    "diffgrab": {
      "command": "diffgrab-mcp",
      "args": []
    }
  }
}

Or with uvx:

{
  "mcpServers": {
    "diffgrab": {
      "command": "uvx",
      "args": ["--from", "diffgrab[mcp]", "diffgrab-mcp"]
    }
  }
}

MCP Tools:

Tool Description
track_url Register a URL for change tracking
check_changes Check tracked URLs for changes
get_diff Get structured diff between snapshots
get_history Browse snapshot history
untrack_url Stop tracking a URL

DiffResult

Every diff operation returns a DiffResult:

@dataclass
class DiffResult:
    url: str                           # The tracked URL
    changed: bool                      # Whether content changed
    added_lines: int                   # Lines added
    removed_lines: int                 # Lines removed
    changed_sections: list[str]        # Markdown headings with changes
    unified_diff: str                  # Standard unified diff
    summary: str                       # Human-readable summary
    before_snapshot_id: int | None     # DB ID of older snapshot
    after_snapshot_id: int | None      # DB ID of newer snapshot
    before_timestamp: str              # When older snapshot was taken
    after_timestamp: str               # When newer snapshot was taken

Storage

Snapshots are stored in SQLite at ~/.local/share/diffgrab/diffgrab.db (auto-created). Custom path:

tracker = DiffTracker(db_path="/path/to/custom.db")

QuartzUnit Ecosystem

Package Role PyPI
markgrab HTML/YouTube/PDF/DOCX to markdown pip install markgrab
snapgrab URL to screenshot + metadata pip install snapgrab
docpick OCR + LLM document extraction pip install docpick
feedkit RSS feed collection pip install feedkit
diffgrab Web page change tracking pip install diffgrab
browsegrab Browser agent for LLMs Coming soon

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diffgrab-0.1.1.tar.gz (20.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

diffgrab-0.1.1-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file diffgrab-0.1.1.tar.gz.

File metadata

  • Download URL: diffgrab-0.1.1.tar.gz
  • Upload date:
  • Size: 20.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for diffgrab-0.1.1.tar.gz
Algorithm Hash digest
SHA256 210c1bad19bc3145f428a1f354326bfe29c64846f89920f3256370588f507b90
MD5 16c8cb778fade4aa4f8396d5b17a2760
BLAKE2b-256 b557c8222d8da46dddae7c2e3b2df0eea32688d2837acbe798ec3e4b5ea5815c

See more details on using hashes here.

Provenance

The following attestation bundles were made for diffgrab-0.1.1.tar.gz:

Publisher: publish.yml on QuartzUnit/diffgrab

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file diffgrab-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: diffgrab-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 15.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for diffgrab-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 28ed0114093f28321962e22ad93b6bc866a6b2eeaa7ff585a9e5a0b4f33d3e83
MD5 1c85294834708d21f61024049302080d
BLAKE2b-256 35e9b0a05713d014947f4ad767bcba272838515e96b5000312754281a7b04e83

See more details on using hashes here.

Provenance

The following attestation bundles were made for diffgrab-0.1.1-py3-none-any.whl:

Publisher: publish.yml on QuartzUnit/diffgrab

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page