Storage-agnostic, executor-agnostic feed capture framework
Project description
title: "FeedSpine README" type: readme status: active tags: [feed-spine, pipeline, data-model, deduplication, medallion] created: 2025-01-15 updated: 2026-04-12
feedspine
Storage-agnostic feed capture with automatic deduplication, sighting history, and medallion architecture.
Quick Start · Architecture · Adapters · Storage · API · Examples
What Is FeedSpine?
FeedSpine is a Python framework for collecting structured data from feeds — RSS, JSON APIs, CSV files, SEC EDGAR, financial data providers — and storing it with automatic deduplication, version tracking, and quality-layer promotion.
Every record is identified by a normalized natural key and a SHA-256 content hash. Collect the same feed a thousand times — each item is stored exactly once, with a full sighting history of when and where it was seen.
When to use it
| Use case | FeedSpine? |
|---|---|
| Collect from RSS / JSON / CSV / file feeds with automatic dedup | ✅ |
| Promote records through quality layers (Bronze → Silver → Gold) | ✅ |
| Track sighting history — "when did each source last see this item?" | ✅ |
| Swap storage (Memory ↔ SQLite ↔ DuckDB ↔ Postgres) without code changes | ✅ |
| Enrich records with entity resolution, metadata, or custom logic | ✅ |
| Full web scraping or browser automation | ❌ |
Quick Start
Install
uv add feedspine # Core
uv add "feedspine[duckdb]" # + DuckDB analytical storage
uv add "feedspine[api]" # + FastAPI REST server
uv add "feedspine[elasticsearch]" # + Elasticsearch search
uv add "feedspine[entity]" # + Entity resolution
uv add "feedspine[all]" # Everything
Collect and deduplicate in 10 lines
import asyncio
from feedspine import create_feed_spine, MemoryStorage, RSSFeedAdapter
async def main():
storage = MemoryStorage()
app = create_feed_spine(storage)
app.register_feed(RSSFeedAdapter(name="news", url="https://news.ycombinator.com/rss"))
# First run — all items are new
outcome = await app.collection_service.run_collection("news")
print(f"New: {outcome.stats.new}, Duplicates: {outcome.stats.duplicates}")
# Second run — duplicates detected automatically
outcome = await app.collection_service.run_collection("news")
print(f"New: {outcome.stats.new}, Duplicates: {outcome.stats.duplicates}")
asyncio.run(main())
Or from the CLI
uv run feedspine collect run --feed news
uv run feedspine feeds list-types # Show available adapters
uv run feedspine health summary # Feed health (RAG status)
Architecture
┌──────────────────────────────────────────────┐
│ PRESENTATION LAYER │
│ CLI (Typer) · REST API (FastAPI) · MCP │
│ Thin wrappers — delegate to Ops layer │
└──────────────────┬───────────────────────────┘
│
┌──────────────────▼───────────────────────────┐
│ OPS LAYER │
│ OperationContext → OperationResult[T] │
│ query · feed · enrich · schedules · runs │
│ Pure business logic — no transport imports │
└──────────────────┬───────────────────────────┘
│
┌──────────────────▼───────────────────────────┐
│ PIPELINE LAYER │
│ RecordCandidate → dedup → Record + Sighting │
│ stages · runner · stats · dedup │
└──────────────────┬───────────────────────────┘
│
┌───────────┐ ┌──────────────────▼───────────────────────────┐
│ Sources │──────▶│ STORAGE LAYER │
│ RSS · JSON │ │ Protocols: StorageBackend, SearchBackend │
│ CSV · File │ │ Repository pattern + dialect abstraction │
│ SEC EDGAR │ ├─────────┬──────────┬──────────┬─────────────┤
│ Polygon.io │ │ Memory │ SQLite │ DuckDB │ PostgreSQL │
└───────────┘ └─────────┴──────────┴──────────┴─────────────┘
Core data flow
- FeedAdapter fetches raw data from a source and yields
RecordCandidateobjects - Pipeline deduplicates each candidate (natural key + content hash), creating or updating a
Record - Sighting is logged for every observation — full audit trail of when and where each item was seen
- Enricher can promote records through medallion layers and add metadata
- StorageBackend persists everything — swap backends without changing pipeline code
Key primitives
| Primitive | Purpose |
|---|---|
RecordCandidate |
Raw input from an adapter. Content hash computed automatically (SHA-256). |
Record |
Stored item with natural_key, content_hash, layer, version tracking, timestamps. |
Sighting |
Observation audit trail — every time a record is seen, from any source. |
Layer |
Quality tier: BRONZE (raw) → SILVER (validated) → GOLD (enriched). |
Pipeline |
Core processing engine: candidate → dedup → record + sighting. |
FeedSpineApp |
Application object created by create_feed_spine() — holds storage, services, feeds. |
CollectionOutcome |
Result of a collection run with stats (processed, new, duplicates, errors). |
OperationContext |
Context for ops-layer functions: storage, search, request_id, caller, dry_run. |
OperationResult[T] |
Typed success/failure envelope returned by all ops functions. |
Built-in Feed Adapters
| Adapter | Source | Natural Key |
|---|---|---|
RSSFeedAdapter |
RSS 2.0 and Atom feeds | Entry GUID or link |
JSONFeedAdapter |
JSON API endpoints with dot-notation path mapping | Configurable field |
CSVFeedAdapter |
Local or HTTP CSV/TSV with composite key support | Configurable column(s) |
FileFeedAdapter |
File-based feeds with content hash change detection | File path |
SECEdgarFilingAdapter |
SEC EDGAR filing submissions API | Accession number |
PolygonEarningsAdapter |
Polygon.io earnings calendar | Ticker + fiscal period |
All adapters implement the FeedAdapter protocol — a @runtime_checkable interface with fetch(), initialize(), and close() methods. Write your own adapter in ~30 lines.
Storage Backends
| Backend | Best For | Install |
|---|---|---|
MemoryStorage |
Testing, development, prototyping | Included |
SQLiteStorage |
Single-user, local dev, small-to-medium datasets | feedspine[sqlalchemy] |
DuckDBStorage |
Analytical queries, time-series, Parquet export | feedspine[duckdb] |
PostgresStorage |
Multi-user production, large datasets, concurrent access | feedspine[postgres] |
All backends implement the StorageBackend protocol — CRUD, batch operations, natural-key lookup for dedup, sighting tracking, and query with filtering/pagination.
# Swap storage in one line — pipeline code stays the same
from feedspine.storage.backends.duckdb import DuckDBStorage
storage = DuckDBStorage("feeds.duckdb")
app = create_feed_spine(storage)
Enrichment
Enrichers transform records and promote them through medallion layers:
| Enricher | Purpose | Install |
|---|---|---|
PassthroughEnricher |
Layer promotion without data changes | Included |
MetadataEnricher |
Add custom fields to record metadata | Included |
EntityEnricher |
Entity resolution (CIK/ticker/name lookup) | feedspine[entity] |
Enrichment is orchestrated through FeedEnrichmentWorker with batch support.
Protocols
FeedSpine defines @runtime_checkable protocols for every extension point:
| Protocol | Module | Purpose |
|---|---|---|
StorageBackend |
protocols.storage |
Record persistence (CRUD, query, batch, sightings) |
RecordStore |
protocols.storage |
Record-specific storage operations |
SightingStore |
protocols.storage |
Sighting tracking and queries |
StorageLifecycle |
protocols.storage |
initialize() + close() lifecycle |
FeedAdapter |
protocols.feed |
Feed source (fetch, initialize, close) |
Enricher |
protocols.enricher |
Single-record enrichment |
BatchEnricher |
protocols.enricher |
Batch enrichment |
SearchBackend |
protocols.search |
Full-text search (index, search, delete) |
RunLogStore |
protocols.run_log |
Pipeline run event logging |
FetchContextStore |
protocols.fetch_context |
HTTP ETag/Last-Modified conditional fetching state |
BlobStorage |
protocols.blob |
Binary file storage |
Cache |
protocols.cache |
Async get/set/delete with TTL |
ProgressReporter |
protocols.progress |
Operation monitoring with ETA |
MessageQueue |
protocols.queue |
Pub/sub messaging |
CollectionStrategy |
protocols.strategy |
Multi-source optimization |
Implement any protocol to extend FeedSpine — no subclassing, no registration boilerplate.
REST API / CLI / MCP
FeedSpine ships with three transport layers. All delegate to the same ops/ business logic.
FastAPI REST API
uv run feedspine api serve --port 11300
# → OpenAPI docs at http://localhost:11300/docs
15 route modules: records, feeds, sightings, search, enrichment, health, metrics, stats, timeline, export, schedules, syndication (RSS/OPML), observations, runs, storage, and collection.
Typer CLI
uv run feedspine collect run --feed sec-filings # Collect from a feed
uv run feedspine feeds list-types # List available adapters
uv run feedspine feeds list # List configured feeds
uv run feedspine health summary # Feed health (RAG: Red/Amber/Green)
uv run feedspine stats summary # Record counts, layer distribution
uv run feedspine query records --limit 10 # Query stored records
uv run feedspine export json output.json # Export to JSON/CSV/Parquet
uv run feedspine info # System info
MCP Server (Model Context Protocol)
13 tools for LLM integration — feed collection, enrichment, timeline queries, search, health, and storage stats:
uv run feedspine-mcp # Start MCP server (stdio)
Search
| Backend | Features | Install |
|---|---|---|
MemorySearch |
Keyword search (linear scan, dev/testing) | Included |
ElasticsearchSearch |
Distributed full-text, relevance scoring, highlighting, aggregations | feedspine[elasticsearch] |
Both implement the SearchBackend protocol with index(), search(), delete(), exists(), and initialize().
Examples
25 runnable examples across 7 categories:
| Category | Examples | Highlights |
|---|---|---|
| Getting Started | 2 | Quickstart, multi-feed collection |
| Storage | 2 | DuckDB persistence, data types |
| Domain Feeds | 1 | SEC EDGAR filing monitor |
| Operations | 11 | Tracking, enrichment, scheduling, health, stats, export |
| Earnings | 7 | Calendar API, CLI, REST, WebSocket, full workflow |
| API | 3 | Unified timeline, RSS/Atom syndication, export formats |
| CLI | 1 | CLI command examples |
uv run python examples/01_getting_started/01_quickstart.py
uv run python examples/run_all.py # Run all 25
Development
uv sync --dev # Install all dependencies
uv run pytest # 1217 tests
uv run ruff check . # Lint
uv run ruff format . # Format
uv run mypy src # Type check
uv run mkdocs serve # Local docs site
Project stats
| Metric | Value |
|---|---|
| Source files | ~190 |
| Test files | ~96 |
| Source LOC | ~35,000 |
| Tests | 1,217 passed, 23 skipped |
Stability
| Aspect | Status |
|---|---|
| Version | 0.3.0 |
| Python | ≥ 3.12 |
| API stability | v0.x — API may change between minor versions |
| License | MIT |
Contributing
See CONTRIBUTING.md.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file feed_spine-0.3.0.tar.gz.
File metadata
- Download URL: feed_spine-0.3.0.tar.gz
- Upload date:
- Size: 578.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a920ce8c2889b1983c9d33280c1442287cae85002a731ac2b879e4bd7dc14bb8
|
|
| MD5 |
42b6c8498a0bf5cdeaa1209f96e0b129
|
|
| BLAKE2b-256 |
ad885c167031c9f4aa966afff85829b5dbd5e049ef1d115a5318c8cf81e3ac9d
|
Provenance
The following attestation bundles were made for feed_spine-0.3.0.tar.gz:
Publisher:
publish.yml on ryansmccoy/feed-spine
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
feed_spine-0.3.0.tar.gz -
Subject digest:
a920ce8c2889b1983c9d33280c1442287cae85002a731ac2b879e4bd7dc14bb8 - Sigstore transparency entry: 1285662394
- Sigstore integration time:
-
Permalink:
ryansmccoy/feed-spine@2ff38924832e2c1a3ffb83f4ad2523fc551d0d2e -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/ryansmccoy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2ff38924832e2c1a3ffb83f4ad2523fc551d0d2e -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file feed_spine-0.3.0-py3-none-any.whl.
File metadata
- Download URL: feed_spine-0.3.0-py3-none-any.whl
- Upload date:
- Size: 346.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b495f59c9b4b22dc4e07826ea2a2618d54d02a6522dd9817125f569e0017de68
|
|
| MD5 |
9f57493a7ce5525370ae6085afeb77e7
|
|
| BLAKE2b-256 |
cce861f92438a6076873441ca37a7ee30ee8daf5b308de46cfe5333413894d4a
|
Provenance
The following attestation bundles were made for feed_spine-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on ryansmccoy/feed-spine
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
feed_spine-0.3.0-py3-none-any.whl -
Subject digest:
b495f59c9b4b22dc4e07826ea2a2618d54d02a6522dd9817125f569e0017de68 - Sigstore transparency entry: 1285662542
- Sigstore integration time:
-
Permalink:
ryansmccoy/feed-spine@2ff38924832e2c1a3ffb83f4ad2523fc551d0d2e -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/ryansmccoy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2ff38924832e2c1a3ffb83f4ad2523fc551d0d2e -
Trigger Event:
workflow_dispatch
-
Statement type: