Skip to main content

Storage-agnostic, executor-agnostic feed capture framework

Project description


title: "FeedSpine README" type: readme status: active tags: [feed-spine, pipeline, data-model, deduplication, medallion] created: 2025-01-15 updated: 2026-04-12

feedspine

Storage-agnostic feed capture with automatic deduplication, sighting history, and medallion architecture.

Version Python 3.12+ License: MIT Tests Ruff

Quick Start · Architecture · Adapters · Storage · API · Examples


What Is FeedSpine?

FeedSpine is a Python framework for collecting structured data from feeds — RSS, JSON APIs, CSV files, SEC EDGAR, financial data providers — and storing it with automatic deduplication, version tracking, and quality-layer promotion.

Every record is identified by a normalized natural key and a SHA-256 content hash. Collect the same feed a thousand times — each item is stored exactly once, with a full sighting history of when and where it was seen.

When to use it

Use case FeedSpine?
Collect from RSS / JSON / CSV / file feeds with automatic dedup
Promote records through quality layers (Bronze → Silver → Gold)
Track sighting history — "when did each source last see this item?"
Swap storage (Memory ↔ SQLite ↔ DuckDB ↔ Postgres) without code changes
Enrich records with entity resolution, metadata, or custom logic
Full web scraping or browser automation

Quick Start

Install

uv add feedspine                      # Core
uv add "feedspine[duckdb]"            # + DuckDB analytical storage
uv add "feedspine[api]"               # + FastAPI REST server
uv add "feedspine[elasticsearch]"     # + Elasticsearch search
uv add "feedspine[entity]"            # + Entity resolution
uv add "feedspine[all]"               # Everything

Collect and deduplicate in 10 lines

import asyncio
from feedspine import create_feed_spine, MemoryStorage, RSSFeedAdapter

async def main():
    storage = MemoryStorage()
    app = create_feed_spine(storage)
    app.register_feed(RSSFeedAdapter(name="news", url="https://news.ycombinator.com/rss"))

    # First run — all items are new
    outcome = await app.collection_service.run_collection("news")
    print(f"New: {outcome.stats.new}, Duplicates: {outcome.stats.duplicates}")

    # Second run — duplicates detected automatically
    outcome = await app.collection_service.run_collection("news")
    print(f"New: {outcome.stats.new}, Duplicates: {outcome.stats.duplicates}")

asyncio.run(main())

Or from the CLI

uv run feedspine collect run --feed news
uv run feedspine feeds list-types        # Show available adapters
uv run feedspine health summary          # Feed health (RAG status)

Architecture

                    ┌──────────────────────────────────────────────┐
                    │              PRESENTATION LAYER              │
                    │  CLI (Typer) · REST API (FastAPI) · MCP      │
                    │  Thin wrappers — delegate to Ops layer       │
                    └──────────────────┬───────────────────────────┘
                                       │
                    ┌──────────────────▼───────────────────────────┐
                    │                OPS LAYER                     │
                    │  OperationContext → OperationResult[T]       │
                    │  query · feed · enrich · schedules · runs    │
                    │  Pure business logic — no transport imports  │
                    └──────────────────┬───────────────────────────┘
                                       │
                    ┌──────────────────▼───────────────────────────┐
                    │             PIPELINE LAYER                   │
                    │  RecordCandidate → dedup → Record + Sighting │
                    │  stages · runner · stats · dedup             │
                    └──────────────────┬───────────────────────────┘
                                       │
┌───────────┐       ┌──────────────────▼───────────────────────────┐
│  Sources   │──────▶│            STORAGE LAYER                    │
│ RSS · JSON │       │  Protocols: StorageBackend, SearchBackend   │
│ CSV · File │       │  Repository pattern + dialect abstraction   │
│ SEC EDGAR  │       ├─────────┬──────────┬──────────┬─────────────┤
│ Polygon.io │       │ Memory  │  SQLite  │  DuckDB  │  PostgreSQL │
└───────────┘       └─────────┴──────────┴──────────┴─────────────┘

Core data flow

  1. FeedAdapter fetches raw data from a source and yields RecordCandidate objects
  2. Pipeline deduplicates each candidate (natural key + content hash), creating or updating a Record
  3. Sighting is logged for every observation — full audit trail of when and where each item was seen
  4. Enricher can promote records through medallion layers and add metadata
  5. StorageBackend persists everything — swap backends without changing pipeline code

Key primitives

Primitive Purpose
RecordCandidate Raw input from an adapter. Content hash computed automatically (SHA-256).
Record Stored item with natural_key, content_hash, layer, version tracking, timestamps.
Sighting Observation audit trail — every time a record is seen, from any source.
Layer Quality tier: BRONZE (raw) → SILVER (validated) → GOLD (enriched).
Pipeline Core processing engine: candidate → dedup → record + sighting.
FeedSpineApp Application object created by create_feed_spine() — holds storage, services, feeds.
CollectionOutcome Result of a collection run with stats (processed, new, duplicates, errors).
OperationContext Context for ops-layer functions: storage, search, request_id, caller, dry_run.
OperationResult[T] Typed success/failure envelope returned by all ops functions.

Built-in Feed Adapters

Adapter Source Natural Key
RSSFeedAdapter RSS 2.0 and Atom feeds Entry GUID or link
JSONFeedAdapter JSON API endpoints with dot-notation path mapping Configurable field
CSVFeedAdapter Local or HTTP CSV/TSV with composite key support Configurable column(s)
FileFeedAdapter File-based feeds with content hash change detection File path
SECEdgarFilingAdapter SEC EDGAR filing submissions API Accession number
PolygonEarningsAdapter Polygon.io earnings calendar Ticker + fiscal period

All adapters implement the FeedAdapter protocol — a @runtime_checkable interface with fetch(), initialize(), and close() methods. Write your own adapter in ~30 lines.


Storage Backends

Backend Best For Install
MemoryStorage Testing, development, prototyping Included
SQLiteStorage Single-user, local dev, small-to-medium datasets feedspine[sqlalchemy]
DuckDBStorage Analytical queries, time-series, Parquet export feedspine[duckdb]
PostgresStorage Multi-user production, large datasets, concurrent access feedspine[postgres]

All backends implement the StorageBackend protocol — CRUD, batch operations, natural-key lookup for dedup, sighting tracking, and query with filtering/pagination.

# Swap storage in one line — pipeline code stays the same
from feedspine.storage.backends.duckdb import DuckDBStorage

storage = DuckDBStorage("feeds.duckdb")
app = create_feed_spine(storage)

Enrichment

Enrichers transform records and promote them through medallion layers:

Enricher Purpose Install
PassthroughEnricher Layer promotion without data changes Included
MetadataEnricher Add custom fields to record metadata Included
EntityEnricher Entity resolution (CIK/ticker/name lookup) feedspine[entity]

Enrichment is orchestrated through FeedEnrichmentWorker with batch support.


Protocols

FeedSpine defines @runtime_checkable protocols for every extension point:

Protocol Module Purpose
StorageBackend protocols.storage Record persistence (CRUD, query, batch, sightings)
RecordStore protocols.storage Record-specific storage operations
SightingStore protocols.storage Sighting tracking and queries
StorageLifecycle protocols.storage initialize() + close() lifecycle
FeedAdapter protocols.feed Feed source (fetch, initialize, close)
Enricher protocols.enricher Single-record enrichment
BatchEnricher protocols.enricher Batch enrichment
SearchBackend protocols.search Full-text search (index, search, delete)
RunLogStore protocols.run_log Pipeline run event logging
FetchContextStore protocols.fetch_context HTTP ETag/Last-Modified conditional fetching state
BlobStorage protocols.blob Binary file storage
Cache protocols.cache Async get/set/delete with TTL
ProgressReporter protocols.progress Operation monitoring with ETA
MessageQueue protocols.queue Pub/sub messaging
CollectionStrategy protocols.strategy Multi-source optimization

Implement any protocol to extend FeedSpine — no subclassing, no registration boilerplate.


REST API / CLI / MCP

FeedSpine ships with three transport layers. All delegate to the same ops/ business logic.

FastAPI REST API

uv run feedspine api serve --port 11300
# → OpenAPI docs at http://localhost:11300/docs

15 route modules: records, feeds, sightings, search, enrichment, health, metrics, stats, timeline, export, schedules, syndication (RSS/OPML), observations, runs, storage, and collection.

Typer CLI

uv run feedspine collect run --feed sec-filings     # Collect from a feed
uv run feedspine feeds list-types                   # List available adapters
uv run feedspine feeds list                         # List configured feeds
uv run feedspine health summary                     # Feed health (RAG: Red/Amber/Green)
uv run feedspine stats summary                      # Record counts, layer distribution
uv run feedspine query records --limit 10           # Query stored records
uv run feedspine export json output.json            # Export to JSON/CSV/Parquet
uv run feedspine info                               # System info

MCP Server (Model Context Protocol)

13 tools for LLM integration — feed collection, enrichment, timeline queries, search, health, and storage stats:

uv run feedspine-mcp                                # Start MCP server (stdio)

Search

Backend Features Install
MemorySearch Keyword search (linear scan, dev/testing) Included
ElasticsearchSearch Distributed full-text, relevance scoring, highlighting, aggregations feedspine[elasticsearch]

Both implement the SearchBackend protocol with index(), search(), delete(), exists(), and initialize().


Examples

25 runnable examples across 7 categories:

Category Examples Highlights
Getting Started 2 Quickstart, multi-feed collection
Storage 2 DuckDB persistence, data types
Domain Feeds 1 SEC EDGAR filing monitor
Operations 11 Tracking, enrichment, scheduling, health, stats, export
Earnings 7 Calendar API, CLI, REST, WebSocket, full workflow
API 3 Unified timeline, RSS/Atom syndication, export formats
CLI 1 CLI command examples
uv run python examples/01_getting_started/01_quickstart.py
uv run python examples/run_all.py                          # Run all 25

Development

uv sync --dev                # Install all dependencies
uv run pytest                # 1217 tests
uv run ruff check .          # Lint
uv run ruff format .         # Format
uv run mypy src              # Type check
uv run mkdocs serve          # Local docs site

Project stats

Metric Value
Source files ~190
Test files ~96
Source LOC ~35,000
Tests 1,217 passed, 23 skipped

Stability

Aspect Status
Version 0.3.0
Python ≥ 3.12
API stability v0.x — API may change between minor versions
License MIT

Contributing

See CONTRIBUTING.md.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feed_spine-0.3.0.tar.gz (578.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

feed_spine-0.3.0-py3-none-any.whl (346.6 kB view details)

Uploaded Python 3

File details

Details for the file feed_spine-0.3.0.tar.gz.

File metadata

  • Download URL: feed_spine-0.3.0.tar.gz
  • Upload date:
  • Size: 578.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for feed_spine-0.3.0.tar.gz
Algorithm Hash digest
SHA256 a920ce8c2889b1983c9d33280c1442287cae85002a731ac2b879e4bd7dc14bb8
MD5 42b6c8498a0bf5cdeaa1209f96e0b129
BLAKE2b-256 ad885c167031c9f4aa966afff85829b5dbd5e049ef1d115a5318c8cf81e3ac9d

See more details on using hashes here.

Provenance

The following attestation bundles were made for feed_spine-0.3.0.tar.gz:

Publisher: publish.yml on ryansmccoy/feed-spine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file feed_spine-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: feed_spine-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 346.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for feed_spine-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b495f59c9b4b22dc4e07826ea2a2618d54d02a6522dd9817125f569e0017de68
MD5 9f57493a7ce5525370ae6085afeb77e7
BLAKE2b-256 cce861f92438a6076873441ca37a7ee30ee8daf5b308de46cfe5333413894d4a

See more details on using hashes here.

Provenance

The following attestation bundles were made for feed_spine-0.3.0-py3-none-any.whl:

Publisher: publish.yml on ryansmccoy/feed-spine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page