Skip to main content

Connector components for the Sayou Data Platform

Project description

sayou-connector

PyPI version License Docs

The Universal Data Ingestion Engine

sayou-connector provides a unified interface to fetch data from diverse sources—Local Files, Web URLs, and Databases—normalizing everything into a standard format called SayouPacket.

It separates the logic of Navigation (Generator) from Retrieval (Fetcher), enabling complex recursive crawling and pagination strategies out of the box.

📦 Installation

pip install sayou-connector

⚡ Quick Start

The ConnectorPipeline manages the feedback loop between Generators and Fetchers.

from sayou.connector.pipeline import ConnectorPipeline

def run_demo():
    # 1. Initialize Pipeline
    pipeline = ConnectorPipeline()
    pipeline.initialize()

    # 2. Run (Example: Web Crawling)
    print("Starting Web Crawl...")
    
    # Returns an iterator of 'SayouPacket' objects
    packets = pipeline.run(
        source="https://news.daum.net/tech",
        strategy="web_crawl",
        link_pattern=r"https://v\.daum\.net/v/\d+",
        max_depth=1
    )

    # 3. Process Results (Stream)
    for packet in packets:
        if packet.success:
            print(f"[Fetched] {packet.task.uri}")
            # packet.data contains the extracted content (dict, bytes, etc.)
            print(f"   Data: {str(packet.data)[:50]}...")
        else:
            print(f"[Error] {packet.error}")

if __name__ == "__main__":
    run_demo()

🔑 Key Concepts

  • Strategies: Switch execution modes effortlessly (file, requests, sqlite).
  • SayouPacket: A standardized data container (Success/Fail status, Data, Metadata) ensuring type safety.
  • Feedback Loop: Generators can dynamically create new tasks based on Fetcher results (e.g., finding new links, next DB page).

🤝 Contributing

We welcome contributions for new Fetchers (e.g., S3, Kafka) or Generators!

📜 License

Apache 2.0 License © 2025 Sayouzone

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sayou_connector-0.1.4.tar.gz (19.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sayou_connector-0.1.4-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file sayou_connector-0.1.4.tar.gz.

File metadata

  • Download URL: sayou_connector-0.1.4.tar.gz
  • Upload date:
  • Size: 19.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sayou_connector-0.1.4.tar.gz
Algorithm Hash digest
SHA256 dcd1dbbf58d4f23c7c5f620bd1bff69e2936efe681619624aaa4c784cddc0394
MD5 ccddf734d82d7aa0fce028f6d363d451
BLAKE2b-256 c4bbf32a663a6ec4b85902421bb183bb5a0dcd6148e7d2e7cdff211085aa4cdc

See more details on using hashes here.

File details

Details for the file sayou_connector-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for sayou_connector-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 8bfd29fb21d1311780c8ae0db454364872e49aa76ca59d09142fb2ca6c89feb4
MD5 f89c6242fad81ea92c6bd232248b6621
BLAKE2b-256 ea640de7ac8956e823be5bf3166f4cb466a0002b08ec46451b54027b21515ffc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page