Skip to main content

Connector components for the Sayou Data Platform

Project description

sayou-connector

PyPI version License Docs

The Universal Data Ingestion Engine for Sayou Fabric.

sayou-connector provides a unified interface to fetch data from diverse sources—Files, Cloud Drives, Databases, and SaaS APIs—normalizing everything into a standard format called SayouPacket.

It decouples the logic of Navigation (Generator) from Retrieval (Fetcher), enabling complex recursive crawling, pagination, and API traversal strategies out of the box.


1. Architecture & Role

The Connector Pipeline manages the Feedback Loop between discovery and retrieval. It yields a stream of SayouPacket objects ready for the next stage (Refinery).

graph LR
    Source[Source String] --> Pipeline[Connector Pipeline]
    
    subgraph Generators [Navigation]
        Dir[File Walker]
        Crawler[Web Frontier]
        APIPag[API Paginator]
    end
    
    subgraph Fetchers [Retrieval]
        Local[File Read]
        HTTP[Requests]
        SQL[DB Query]
    end
    
    Pipeline --> Generators
    Generators -->|Task| Fetchers
    Fetchers -->|Packet| Pipeline
    Pipeline -->|Feedback| Generators

1.1. Core Features

  • Generator/Fetcher Pattern: Separates "Where to go next" (Generator) from "How to get it" (Fetcher).
  • Unified Packet: Whether the source is a Notion Page or a PostgreSQL Row, the output is always a uniform SayouPacket.
  • Resilience: Built-in rate limiting, retries, and error handling for unstable network sources.

2. Supported Sources

sayou-connector supports a vast array of plugins, continuously expanding to cover Enterprise SaaS and Databases.

Category Key Sources Description
Local / File file, obsidian Local file systems, Markdown vaults.
Web / Media web, youtube, wikipedia, rss Web crawling (Trafilatura), YouTube transcripts, Wiki articles.
SaaS / Cloud github, notion, google_drive, gmail Repository code, Notion workspaces, G-Suite documents.
Database postgres, mysql, mongodb, oracle SQL/NoSQL databases with pagination support.

3. Installation

pip install sayou-connector

4. Usage

The ConnectorPipeline acts as the entry point. It automatically detects the source type or accepts a specific strategy.

Case A: Local & Web (Simple)

Fetching simple files or web pages.

from sayou.connector import ConnectorPipeline

packets = ConnectorPipeline.process(
    source="./my_docs",
    strategy="file"
)

web_packets = ConnectorPipeline.process(
    source="https://news.daum.net/tech",
    strategy="web"
)

for packet in web_packets:
    print(f"[Fetched] {packet.uri} ({len(packet.data)} bytes)")

Case B: SaaS Integration (GitHub / Notion)

Fetching structured data from external APIs.

from sayou.connector import ConnectorPipeline

repo_packets = ConnectorPipeline.process(
    source="https://github.com/sayouzone/sayou-fabric",
    strategy="github"
)

print(f"Collected {len(list(repo_packets))} files from repo.")

Case C: Database Ingestion

Fetching rows from a database table.

from sayou.connector import ConnectorPipeline

db_config = {
    "host": "localhost",
    "user": "admin",
    "password": "password",
    "db": "sales_db"
}

# Fetch rows from 'orders' table
db_packets = ConnectorPipeline.process(
    source="orders", 
    strategy="postgres",
    config=db_config
)

# Each packet contains a batch of rows
for packet in db_packets:
    print(f"Batch rows: {len(packet.data)}")

5. Configuration Keys

The config dictionary is crucial for authentication and connection settings.

  • auth: API Keys (e.g., github_token, notion_token, google_creds).
  • db: Database credentials (host, port, user, password).
  • crawl: Web crawling settings (user_agent, depth_limit, domain_lock).
  • filter: File extensions to include/exclude (e.g., include=[".py", ".md"]).

6. License

Apache 2.0 License © 2026 Sayouzone

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sayou_connector-0.4.0.tar.gz (50.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sayou_connector-0.4.0-py3-none-any.whl (67.9 kB view details)

Uploaded Python 3

File details

Details for the file sayou_connector-0.4.0.tar.gz.

File metadata

  • Download URL: sayou_connector-0.4.0.tar.gz
  • Upload date:
  • Size: 50.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sayou_connector-0.4.0.tar.gz
Algorithm Hash digest
SHA256 7f3d69e08ccba13f491c7c0e9a1c2da127db3d7ff57f2341d97d344718a2b58b
MD5 0369fbb21b44e925a6f04305e8a392ee
BLAKE2b-256 02d297d0d3005beb4ff506b61e2d6ccb3dcf1bd5dac957c26dec60b6b6d066aa

See more details on using hashes here.

File details

Details for the file sayou_connector-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sayou_connector-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ea0c706ae4fa7e0a0771a1034725f63e718b589887e0ae9fd5ffe4e7bb4e77f4
MD5 b81f0e1629babefede45c929afe37103
BLAKE2b-256 d52dea7bce4445ff06effd15b082dfc4e11ad3843ba5b5c95d3b13462372583f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page