Python client library for the Collaboration Tunnel Protocol (TCT)
Project description
Collaboration Tunnel Protocol - Python Client
A Python library for efficiently crawling websites that implement the Collaboration Tunnel Protocol (TCT), achieving up to 90% bandwidth savings through sitemap-first discovery and conditional requests.
Installation
pip install collab-tunnel
Quick Start
from collab_tunnel import CollabTunnelCrawler
# Initialize crawler
crawler = CollabTunnelCrawler(user_agent="MyBot/1.0")
# Fetch sitemap
sitemap = crawler.fetch_sitemap("https://example.com/llm-sitemap.json")
# Crawl items
for item in sitemap.items:
if crawler.should_fetch(item): # Zero-fetch optimization
content = crawler.fetch_content(item['mUrl'], item['contentHash'])
if content:
print(f"Title: {content['title']}")
print(f"Content: {content['content'][:200]}...")
# View stats
stats = crawler.get_stats()
print(f"Bandwidth saved: {stats['savings_percentage']}%")
print(f"Requests skipped: {stats['total_skips']}")
Features
- ✅ Sitemap-First Discovery: Skip 90%+ of unchanged URLs
- ✅ Conditional Requests: 304 Not Modified support
- ✅ ETag Validation: Verify content integrity
- ✅ Bandwidth Tracking: Monitor savings vs traditional crawling
- ✅ Handshake Verification: Validate C-URL ↔ M-URL mapping
Advanced Usage
Crawl Entire Site
from collab_tunnel import crawl_site
results = crawl_site(
"https://example.com/llm-sitemap.json",
limit=100,
user_agent="MyBot/1.0"
)
for result in results:
print(result['title'], result['canonical_url'])
Filter by Date
from datetime import datetime, timedelta
from collab_tunnel import CollabTunnelCrawler
crawler = CollabTunnelCrawler()
sitemap = crawler.fetch_sitemap("https://example.com/llm-sitemap.json")
# Get items modified in last 7 days
recent_items = sitemap.filter_by_date(
datetime.now() - timedelta(days=7)
)
for item in recent_items:
content = crawler.fetch_content(item['mUrl'])
# Process recent content...
Verify Protocol Compliance
from collab_tunnel import ContentValidator
validator = ContentValidator()
# Check headers
headers = {
'Content-Type': 'application/json; charset=UTF-8',
'ETag': 'W/"sha256-abc123..."',
'Link': '<https://example.com/post/>; rel="canonical"',
'Cache-Control': 'max-age=0, must-revalidate, stale-while-revalidate=60, stale-if-error=86400',
'Vary': 'Accept-Encoding'
}
results = validator.check_headers(headers)
if results['compliant']:
print("✅ Protocol compliant!")
else:
print("❌ Errors:", results['errors'])
Validate Profile Field
from collab_tunnel import CollabTunnelCrawler
crawler = CollabTunnelCrawler()
# Fetch M-URL content
content = crawler.fetch_content("https://example.com/post/llm/")
# Check profile field
profile = content.get('profile')
if profile == 'tct-1':
print("✅ Recognized protocol version: tct-1")
elif profile:
print(f"⚠️ Unknown protocol version: {profile} (forward compatibility)")
# Future versions - client can decide how to handle
else:
print("⚠️ No profile field (legacy or non-compliant endpoint)")
# Validate sitemap profile
sitemap = crawler.fetch_sitemap("https://example.com/llm-sitemap.json")
sitemap_profile = sitemap.data.get('profile')
if sitemap_profile == 'tct-1':
print("✅ Sitemap protocol version: tct-1")
Protocol Overview
The Collaboration Tunnel Protocol (TCT) enables efficient content delivery through:
-
Bidirectional Handshake
- C-URL (HTML page) → M-URL (JSON endpoint) via
<link rel="alternate"> - M-URL → C-URL via
Link: <C-URL>; rel="canonical"header
- C-URL (HTML page) → M-URL (JSON endpoint) via
-
Template-Invariant Fingerprinting
- Content normalized through 6-step pipeline: decode entities, NFKC, casefold, remove Cc (except TAB/LF/CR), collapse ASCII whitespace, trim; then SHA-256
- Weak ETag format:
W/"sha256-..." - Stable across theme changes
-
Sitemap-First Verification
- JSON sitemap lists (cUrl, mUrl, contentHash)
- Skip fetch if hash unchanged (90%+ skip rate)
-
Conditional Request Discipline
- If-None-Match takes precedence
- 304 Not Modified for unchanged content
Response Format
M-URL JSON Payload
{
"profile": "tct-1",
"llm_url": "https://example.com/post/llm/",
"canonical_url": "https://example.com/post/",
"hash": "sha256-e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"title": "Article Title",
"content": "Article content...",
"modified": "2025-10-23T18:00:00Z"
}
Profile Field: "profile": "tct-1" enables protocol versioning for future compatibility.
HTTP Headers
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Link: <https://example.com/post/>; rel="canonical"
ETag: W/"sha256-e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
Cache-Control: max-age=0, must-revalidate, stale-while-revalidate=60, stale-if-error=86400
Vary: Accept-Encoding
Weak ETag Format: W/"sha256-..." signals semantic (not byte-for-byte) equivalence, per RFC 9110 Section 8.8.1.
Sitemap Format
{
"version": 1,
"profile": "tct-1",
"items": [
{
"cUrl": "https://example.com/post/",
"mUrl": "https://example.com/post/llm/",
"modified": "2025-10-23T18:00:00Z",
"contentHash": "sha256-e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
}
]
}
API Reference
CollabTunnelCrawler
Methods:
fetch_sitemap(sitemap_url)- Fetch and parse sitemapshould_fetch(item)- Check if item needs fetching (zero-fetch logic)fetch_content(m_url, expected_hash)- Fetch M-URL with conditional requestverify_handshake(c_url, m_url)- Verify bidirectional handshakeget_stats()- Get bandwidth savings statistics
SitemapParser
Properties:
items- List of sitemap itemsversion- Sitemap versioncount- Total number of items
Methods:
filter_by_date(since)- Filter items by modification datefind_by_canonical(c_url)- Find item by canonical URLget_stats()- Get sitemap statistics
ContentValidator
Static Methods:
validate_parity(sitemap_hash, etag, payload_hash)- Compliance: parity-only checkvalidate_etag(etag, content)- Diagnostic: recompute hash from contentnormalize_minimal(text)- Normalization for diagnostics only (6-step TCT spec algorithm)check_headers(headers)- Check protocol compliancecheck_head_get_parity(get_headers, head_headers)- Ensure HEAD mirrors GET headersvalidate_sitemap_item(item)- Validate sitemap item structure
License
MIT License - See LICENSE file for details
Links
- Website: https://llmpages.org
- GitHub: https://github.com/antunjurkovic-collab/collab-tunnel-python
- PyPI: https://pypi.org/project/collab-tunnel/
- Documentation: https://llmpages.org/docs/python/
- Patent: US 63/895,763 (Provisional, filed October 2025)
Contributing
Contributions welcome! Please open an issue or submit a pull request.
Support
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file collab_tunnel-2.0.0.tar.gz.
File metadata
- Download URL: collab_tunnel-2.0.0.tar.gz
- Upload date:
- Size: 15.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
125eca6300a8cce73999370daab840f390d8cee666d3efebb8d1e88706e6e103
|
|
| MD5 |
73674f2bf20e1ab1225ca323586cf984
|
|
| BLAKE2b-256 |
13bc6d8b14c18f69c8494819b5b2777fe45fc816fec3a7268ddfddca55ddc646
|
File details
Details for the file collab_tunnel-2.0.0-py3-none-any.whl.
File metadata
- Download URL: collab_tunnel-2.0.0-py3-none-any.whl
- Upload date:
- Size: 14.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c7cb7b723a29bff7b16baec10e306b1c643b39507259aed191b41069de45a78
|
|
| MD5 |
3588e2bddc71b100ca654accbf6d23bd
|
|
| BLAKE2b-256 |
1a22e9c95456751ac018d396c7f9fdaf7dcbd8fcb8947ecd48a0e167c622c9da
|