Storage-agnostic, executor-agnostic feed capture framework

These details have not been verified by PyPI

Project description

title: "FeedSpine README" type: readme status: active tags: [feed-spine, pipeline, data-model, deduplication, medallion] created: 2025-01-15 updated: 2026-04-12

feedspine

Storage-agnostic feed capture with automatic deduplication, sighting history, and medallion architecture.

Quick Start · Architecture · Adapters · Storage · API · Examples

What Is FeedSpine?

FeedSpine is a Python framework for collecting structured data from feeds — RSS, JSON APIs, CSV files, SEC EDGAR, financial data providers — and storing it with automatic deduplication, version tracking, and quality-layer promotion.

Every record is identified by a normalized natural key and a SHA-256 content hash. Collect the same feed a thousand times — each item is stored exactly once, with a full sighting history of when and where it was seen.

When to use it

Use case	FeedSpine?
Collect from RSS / JSON / CSV / file feeds with automatic dedup	✅
Promote records through quality layers (Bronze → Silver → Gold)	✅
Track sighting history — "when did each source last see this item?"	✅
Swap storage (Memory ↔ SQLite ↔ DuckDB ↔ Postgres) without code changes	✅
Enrich records with entity resolution, metadata, or custom logic	✅
Full web scraping or browser automation	❌

Quick Start

Install

uv add feedspine                      # Core
uv add "feedspine[duckdb]"            # + DuckDB analytical storage
uv add "feedspine[api]"               # + FastAPI REST server
uv add "feedspine[elasticsearch]"     # + Elasticsearch search
uv add "feedspine[entity]"            # + Entity resolution
uv add "feedspine[all]"               # Everything

Collect and deduplicate in 10 lines

import asyncio
from feedspine import create_feed_spine, MemoryStorage, RSSFeedAdapter

async def main():
    storage = MemoryStorage()
    app = create_feed_spine(storage)
    app.register_feed(RSSFeedAdapter(name="news", url="https://news.ycombinator.com/rss"))

    # First run — all items are new
    outcome = await app.collection_service.run_collection("news")
    print(f"New: {outcome.stats.new}, Duplicates: {outcome.stats.duplicates}")

    # Second run — duplicates detected automatically
    outcome = await app.collection_service.run_collection("news")
    print(f"New: {outcome.stats.new}, Duplicates: {outcome.stats.duplicates}")

asyncio.run(main())

Or from the CLI

uv run feedspine collect run --feed news
uv run feedspine feeds list-types        # Show available adapters
uv run feedspine health summary          # Feed health (RAG status)

Architecture

                    ┌──────────────────────────────────────────────┐
                    │              PRESENTATION LAYER              │
                    │  CLI (Typer) · REST API (FastAPI) · MCP      │
                    │  Thin wrappers — delegate to Ops layer       │
                    └──────────────────┬───────────────────────────┘
                                       │
                    ┌──────────────────▼───────────────────────────┐
                    │                OPS LAYER                     │
                    │  OperationContext → OperationResult[T]       │
                    │  query · feed · enrich · schedules · runs    │
                    │  Pure business logic — no transport imports  │
                    └──────────────────┬───────────────────────────┘
                                       │
                    ┌──────────────────▼───────────────────────────┐
                    │             PIPELINE LAYER                   │
                    │  RecordCandidate → dedup → Record + Sighting │
                    │  stages · runner · stats · dedup             │
                    └──────────────────┬───────────────────────────┘
                                       │
┌───────────┐       ┌──────────────────▼───────────────────────────┐
│  Sources   │──────▶│            STORAGE LAYER                    │
│ RSS · JSON │       │  Protocols: StorageBackend, SearchBackend   │
│ CSV · File │       │  Repository pattern + dialect abstraction   │
│ SEC EDGAR  │       ├─────────┬──────────┬──────────┬─────────────┤
│ Polygon.io │       │ Memory  │  SQLite  │  DuckDB  │  PostgreSQL │
└───────────┘       └─────────┴──────────┴──────────┴─────────────┘

Core data flow

FeedAdapter fetches raw data from a source and yields RecordCandidate objects
Pipeline deduplicates each candidate (natural key + content hash), creating or updating a Record
Sighting is logged for every observation — full audit trail of when and where each item was seen
Enricher can promote records through medallion layers and add metadata
StorageBackend persists everything — swap backends without changing pipeline code

Key primitives

Primitive	Purpose
`RecordCandidate`	Raw input from an adapter. Content hash computed automatically (SHA-256).
`Record`	Stored item with `natural_key`, `content_hash`, `layer`, version tracking, timestamps.
`Sighting`	Observation audit trail — every time a record is seen, from any source.
`Layer`	Quality tier: `BRONZE` (raw) → `SILVER` (validated) → `GOLD` (enriched).
`Pipeline`	Core processing engine: candidate → dedup → record + sighting.
`FeedSpineApp`	Application object created by `create_feed_spine()` — holds storage, services, feeds.
`CollectionOutcome`	Result of a collection run with stats (processed, new, duplicates, errors).
`OperationContext`	Context for ops-layer functions: storage, search, request_id, caller, dry_run.
`OperationResult[T]`	Typed success/failure envelope returned by all ops functions.

Built-in Feed Adapters

Adapter	Source	Natural Key
`RSSFeedAdapter`	RSS 2.0 and Atom feeds	Entry GUID or link
`JSONFeedAdapter`	JSON API endpoints with dot-notation path mapping	Configurable field
`CSVFeedAdapter`	Local or HTTP CSV/TSV with composite key support	Configurable column(s)
`FileFeedAdapter`	File-based feeds with content hash change detection	File path
`SECEdgarFilingAdapter`	SEC EDGAR filing submissions API	Accession number
`PolygonEarningsAdapter`	Polygon.io earnings calendar	Ticker + fiscal period

All adapters implement the FeedAdapter protocol — a @runtime_checkable interface with fetch(), initialize(), and close() methods. Write your own adapter in ~30 lines.

Storage Backends

Backend	Best For	Install
`MemoryStorage`	Testing, development, prototyping	Included
`SQLiteStorage`	Single-user, local dev, small-to-medium datasets	`feedspine[sqlalchemy]`
`DuckDBStorage`	Analytical queries, time-series, Parquet export	`feedspine[duckdb]`
`PostgresStorage`	Multi-user production, large datasets, concurrent access	`feedspine[postgres]`

All backends implement the StorageBackend protocol — CRUD, batch operations, natural-key lookup for dedup, sighting tracking, and query with filtering/pagination.

# Swap storage in one line — pipeline code stays the same
from feedspine.storage.backends.duckdb import DuckDBStorage

storage = DuckDBStorage("feeds.duckdb")
app = create_feed_spine(storage)

Enrichment

Enrichers transform records and promote them through medallion layers:

Enricher	Purpose	Install
`PassthroughEnricher`	Layer promotion without data changes	Included
`MetadataEnricher`	Add custom fields to record metadata	Included
`EntityEnricher`	Entity resolution (CIK/ticker/name lookup)	`feedspine[entity]`

Enrichment is orchestrated through FeedEnrichmentWorker with batch support.

Protocols

FeedSpine defines @runtime_checkable protocols for every extension point:

Protocol	Module	Purpose
`StorageBackend`	`protocols.storage`	Record persistence (CRUD, query, batch, sightings)
`RecordStore`	`protocols.storage`	Record-specific storage operations
`SightingStore`	`protocols.storage`	Sighting tracking and queries
`StorageLifecycle`	`protocols.storage`	`initialize()` + `close()` lifecycle
`FeedAdapter`	`protocols.feed`	Feed source (fetch, initialize, close)
`Enricher`	`protocols.enricher`	Single-record enrichment
`BatchEnricher`	`protocols.enricher`	Batch enrichment
`SearchBackend`	`protocols.search`	Full-text search (index, search, delete)
`RunLogStore`	`protocols.run_log`	Pipeline run event logging
`FetchContextStore`	`protocols.fetch_context`	HTTP ETag/Last-Modified conditional fetching state
`BlobStorage`	`protocols.blob`	Binary file storage
`Cache`	`protocols.cache`	Async get/set/delete with TTL
`ProgressReporter`	`protocols.progress`	Operation monitoring with ETA
`MessageQueue`	`protocols.queue`	Pub/sub messaging
`CollectionStrategy`	`protocols.strategy`	Multi-source optimization

Implement any protocol to extend FeedSpine — no subclassing, no registration boilerplate.

REST API / CLI / MCP

FeedSpine ships with three transport layers. All delegate to the same ops/ business logic.

FastAPI REST API

uv run feedspine api serve --port 11300
# → OpenAPI docs at http://localhost:11300/docs

15 route modules: records, feeds, sightings, search, enrichment, health, metrics, stats, timeline, export, schedules, syndication (RSS/OPML), observations, runs, storage, and collection.

Typer CLI

uv run feedspine collect run --feed sec-filings     # Collect from a feed
uv run feedspine feeds list-types                   # List available adapters
uv run feedspine feeds list                         # List configured feeds
uv run feedspine health summary                     # Feed health (RAG: Red/Amber/Green)
uv run feedspine stats summary                      # Record counts, layer distribution
uv run feedspine query records --limit 10           # Query stored records
uv run feedspine export json output.json            # Export to JSON/CSV/Parquet
uv run feedspine info                               # System info

MCP Server (Model Context Protocol)

13 tools for LLM integration — feed collection, enrichment, timeline queries, search, health, and storage stats:

uv run feedspine-mcp                                # Start MCP server (stdio)

Search

Backend	Features	Install
`MemorySearch`	Keyword search (linear scan, dev/testing)	Included
`ElasticsearchSearch`	Distributed full-text, relevance scoring, highlighting, aggregations	`feedspine[elasticsearch]`

Both implement the SearchBackend protocol with index(), search(), delete(), exists(), and initialize().

Examples

25 runnable examples across 7 categories:

Category	Examples	Highlights
Getting Started	2	Quickstart, multi-feed collection
Storage	2	DuckDB persistence, data types
Domain Feeds	1	SEC EDGAR filing monitor
Operations	11	Tracking, enrichment, scheduling, health, stats, export
Earnings	7	Calendar API, CLI, REST, WebSocket, full workflow
API	3	Unified timeline, RSS/Atom syndication, export formats
CLI	1	CLI command examples

uv run python examples/01_getting_started/01_quickstart.py
uv run python examples/run_all.py                          # Run all 25

Development

uv sync --dev                # Install all dependencies
uv run pytest                # 1217 tests
uv run ruff check .          # Lint
uv run ruff format .         # Format
uv run mypy src              # Type check
uv run mkdocs serve          # Local docs site

Project stats

Metric	Value
Source files	~190
Test files	~96
Source LOC	~35,000
Tests	1,217 passed, 23 skipped

Stability

Aspect	Status
Version	0.3.0
Python	≥ 3.12
API stability	v0.x — API may change between minor versions
License	MIT

Contributing

See CONTRIBUTING.md.

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.0

Apr 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feed_spine-0.3.0.tar.gz (578.4 kB view details)

Uploaded Apr 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

feed_spine-0.3.0-py3-none-any.whl (346.6 kB view details)

Uploaded Apr 13, 2026 Python 3

File details

Details for the file feed_spine-0.3.0.tar.gz.

File metadata

Download URL: feed_spine-0.3.0.tar.gz
Upload date: Apr 13, 2026
Size: 578.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for feed_spine-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`a920ce8c2889b1983c9d33280c1442287cae85002a731ac2b879e4bd7dc14bb8`
MD5	`42b6c8498a0bf5cdeaa1209f96e0b129`
BLAKE2b-256	`ad885c167031c9f4aa966afff85829b5dbd5e049ef1d115a5318c8cf81e3ac9d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for feed_spine-0.3.0.tar.gz:

Publisher: publish.yml on ryansmccoy/feed-spine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: feed_spine-0.3.0.tar.gz
- Subject digest: a920ce8c2889b1983c9d33280c1442287cae85002a731ac2b879e4bd7dc14bb8
- Sigstore transparency entry: 1285662394
- Sigstore integration time: Apr 13, 2026
Source repository:
- Permalink: ryansmccoy/feed-spine@2ff38924832e2c1a3ffb83f4ad2523fc551d0d2e
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/ryansmccoy
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2ff38924832e2c1a3ffb83f4ad2523fc551d0d2e
- Trigger Event: workflow_dispatch

File details

Details for the file feed_spine-0.3.0-py3-none-any.whl.

File metadata

Download URL: feed_spine-0.3.0-py3-none-any.whl
Upload date: Apr 13, 2026
Size: 346.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for feed_spine-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b495f59c9b4b22dc4e07826ea2a2618d54d02a6522dd9817125f569e0017de68`
MD5	`9f57493a7ce5525370ae6085afeb77e7`
BLAKE2b-256	`cce861f92438a6076873441ca37a7ee30ee8daf5b308de46cfe5333413894d4a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for feed_spine-0.3.0-py3-none-any.whl:

Publisher: publish.yml on ryansmccoy/feed-spine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: feed_spine-0.3.0-py3-none-any.whl
- Subject digest: b495f59c9b4b22dc4e07826ea2a2618d54d02a6522dd9817125f569e0017de68
- Sigstore transparency entry: 1285662542
- Sigstore integration time: Apr 13, 2026
Source repository:
- Permalink: ryansmccoy/feed-spine@2ff38924832e2c1a3ffb83f4ad2523fc551d0d2e
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/ryansmccoy
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2ff38924832e2c1a3ffb83f4ad2523fc551d0d2e
- Trigger Event: workflow_dispatch

feed-spine 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

title: "FeedSpine README" type: readme status: active tags: [feed-spine, pipeline, data-model, deduplication, medallion] created: 2025-01-15 updated: 2026-04-12

feedspine

What Is FeedSpine?

When to use it

Quick Start

Install

Collect and deduplicate in 10 lines

Or from the CLI

Architecture

Core data flow

Key primitives

Built-in Feed Adapters

Storage Backends

Enrichment

Protocols

REST API / CLI / MCP

FastAPI REST API

Typer CLI

MCP Server (Model Context Protocol)

Search

Examples

Development

Project stats

Stability

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance