Web page change tracking with structured diffs — markgrab + snapgrab integration, MCP native.
Project description
diffgrab
Web page change tracking with structured diffs. markgrab + snapgrab integration, MCP native.
from diffgrab import DiffTracker
tracker = DiffTracker()
await tracker.track("https://example.com")
changes = await tracker.check()
for c in changes:
if c.changed:
print(c.summary) # "3 lines added, 1 lines removed in sections: Introduction."
print(c.unified_diff) # Standard unified diff output
await tracker.close()
Features
- Change detection — track any URL, detect content changes via content hashing
- Structured diffs — unified diff + section-level analysis (which headings changed)
- Human-readable summaries — "5 lines added, 2 removed in sections: Intro, Methods"
- Snapshot history — SQLite storage, browse past versions of any page
- markgrab powered — HTML/YouTube/PDF/DOCX extraction via markgrab
- Visual diff — optional screenshot comparison via snapgrab
- MCP server — 5 tools for Claude Code / MCP clients
- CLI included —
diffgrab track,check,diff,history,untrack
Install
pip install diffgrab
Optional extras:
pip install 'diffgrab[cli]' # CLI with click + rich
pip install 'diffgrab[visual]' # Visual diff with snapgrab
pip install 'diffgrab[mcp]' # MCP server with fastmcp
pip install 'diffgrab[all]' # Everything
Usage
Python API
import asyncio
from diffgrab import DiffTracker
async def main():
tracker = DiffTracker()
# Track a URL (takes initial snapshot)
await tracker.track("https://example.com", interval_hours=12)
# Check for changes
changes = await tracker.check()
for change in changes:
if change.changed:
print(change.summary)
print(change.unified_diff)
# Get diff between specific snapshots
result = await tracker.diff("https://example.com", before_id=1, after_id=2)
# Browse snapshot history
history = await tracker.history("https://example.com", count=20)
# Stop tracking
await tracker.untrack("https://example.com")
await tracker.close()
asyncio.run(main())
Convenience Functions
from diffgrab import track, check, diff, history, untrack
await track("https://example.com")
changes = await check()
result = await diff("https://example.com")
snaps = await history("https://example.com")
await untrack("https://example.com")
CLI
# Track a URL
diffgrab track https://example.com --interval 12
# Check all tracked URLs for changes
diffgrab check
# Check a specific URL
diffgrab check https://example.com
# Show diff between snapshots
diffgrab diff https://example.com
diffgrab diff https://example.com --before 1 --after 3
# View snapshot history
diffgrab history https://example.com --count 20
# Stop tracking
diffgrab untrack https://example.com
MCP Server
Add to your Claude Code MCP config:
{
"mcpServers": {
"diffgrab": {
"command": "diffgrab-mcp",
"args": []
}
}
}
Or with uvx:
{
"mcpServers": {
"diffgrab": {
"command": "uvx",
"args": ["--from", "diffgrab[mcp]", "diffgrab-mcp"]
}
}
}
MCP Tools:
| Tool | Description |
|---|---|
track_url |
Register a URL for change tracking |
check_changes |
Check tracked URLs for changes |
get_diff |
Get structured diff between snapshots |
get_history |
Browse snapshot history |
untrack_url |
Stop tracking a URL |
DiffResult
Every diff operation returns a DiffResult:
@dataclass
class DiffResult:
url: str # The tracked URL
changed: bool # Whether content changed
added_lines: int # Lines added
removed_lines: int # Lines removed
changed_sections: list[str] # Markdown headings with changes
unified_diff: str # Standard unified diff
summary: str # Human-readable summary
before_snapshot_id: int | None # DB ID of older snapshot
after_snapshot_id: int | None # DB ID of newer snapshot
before_timestamp: str # When older snapshot was taken
after_timestamp: str # When newer snapshot was taken
Storage
Snapshots are stored in SQLite at ~/.local/share/diffgrab/diffgrab.db (auto-created). Custom path:
tracker = DiffTracker(db_path="/path/to/custom.db")
QuartzUnit Ecosystem
| Package | Role | PyPI |
|---|---|---|
| markgrab | HTML/YouTube/PDF/DOCX to markdown | pip install markgrab |
| snapgrab | URL to screenshot + metadata | pip install snapgrab |
| docpick | OCR + LLM document extraction | pip install docpick |
| feedkit | RSS feed collection | pip install feedkit |
| diffgrab | Web page change tracking | pip install diffgrab |
| browsegrab | Browser agent for LLMs | Coming soon |
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file diffgrab-0.1.1.tar.gz.
File metadata
- Download URL: diffgrab-0.1.1.tar.gz
- Upload date:
- Size: 20.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
210c1bad19bc3145f428a1f354326bfe29c64846f89920f3256370588f507b90
|
|
| MD5 |
16c8cb778fade4aa4f8396d5b17a2760
|
|
| BLAKE2b-256 |
b557c8222d8da46dddae7c2e3b2df0eea32688d2837acbe798ec3e4b5ea5815c
|
Provenance
The following attestation bundles were made for diffgrab-0.1.1.tar.gz:
Publisher:
publish.yml on QuartzUnit/diffgrab
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
diffgrab-0.1.1.tar.gz -
Subject digest:
210c1bad19bc3145f428a1f354326bfe29c64846f89920f3256370588f507b90 - Sigstore transparency entry: 1178846441
- Sigstore integration time:
-
Permalink:
QuartzUnit/diffgrab@e33e4560c1cea320534ddffb5718a477186fc95c -
Branch / Tag:
refs/heads/main - Owner: https://github.com/QuartzUnit
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e33e4560c1cea320534ddffb5718a477186fc95c -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file diffgrab-0.1.1-py3-none-any.whl.
File metadata
- Download URL: diffgrab-0.1.1-py3-none-any.whl
- Upload date:
- Size: 15.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28ed0114093f28321962e22ad93b6bc866a6b2eeaa7ff585a9e5a0b4f33d3e83
|
|
| MD5 |
1c85294834708d21f61024049302080d
|
|
| BLAKE2b-256 |
35e9b0a05713d014947f4ad767bcba272838515e96b5000312754281a7b04e83
|
Provenance
The following attestation bundles were made for diffgrab-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on QuartzUnit/diffgrab
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
diffgrab-0.1.1-py3-none-any.whl -
Subject digest:
28ed0114093f28321962e22ad93b6bc866a6b2eeaa7ff585a9e5a0b4f33d3e83 - Sigstore transparency entry: 1178846446
- Sigstore integration time:
-
Permalink:
QuartzUnit/diffgrab@e33e4560c1cea320534ddffb5718a477186fc95c -
Branch / Tag:
refs/heads/main - Owner: https://github.com/QuartzUnit
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e33e4560c1cea320534ddffb5718a477186fc95c -
Trigger Event:
workflow_dispatch
-
Statement type: