Skip to main content

Extractor components for the Sayou Data Fabric

Project description

Sayou Refinery (sayou_refinery)

A pluggable framework for refining raw Data Atoms into a coherent Knowledge Graph (KG) for advanced LLM applications.


💡 Why Sayou Refinery?

sayou_refinery solves the core problem of organizing messy, disconnected data into a structured KG. This KG acts as a "map" for RAG pipelines, allowing LLMs to retrieve accurate, context-aware data, minimizing hallucinations and costs.

  • Pluggable Architecture: Bring your own data store (Neo4j, JSON) or refinement logic.
  • Ontology-Driven: Ensures all data conforms to your central schema.
  • Focused Responsibility: Does one job well: Refine & Link. No connectors, no embedding logic.

🚀 Quick Start (v.0.0.1)

1. Installation

pip install sayou-refinery

2. Usage (Example)

sayou_refinery is a library. You import it into your own project. See the full code in examples/subway_refinery/run.py.

# your_project/run.py
from sayou.refinery.pipeline import Pipeline
from sayou.refinery.schema.manager import OntologyManager
from sayou.refinery.schema.validator import SchemaValidator
from sayou.refinery.graph.builder import KnowledgeGraphBuilder
from sayou.refinery.linker.default_linker import DefaultLinker
from sayou.refinery.store.json_store import JsonStore

# 1. Import your custom domain logic
from your_project.my_refiner import MyDomainRefiner

# 2. Prepare components (Explicit Injection)
schema_manager = OntologyManager()
validator = SchemaValidator()
refiner = MyDomainRefiner() # Your logic
builder = KnowledgeGraphBuilder()
linker = DefaultLinker()
store = JsonStore()

# 3. Create and configure the pipeline
pipeline = Pipeline(
    schema_manager=schema_manager,
    validator=validator,
    refiner=refiner,
    builder=builder,
    linker=linker,
    store=store
)

pipeline.initialize(
    ontology_path="path/to/your_schema.json",
    filepath="output/my_kg.json" # Config for JsonStore
)

# 4. Load your data atoms
my_atoms = [...] # Load your DataAtom objects

# 5. Run
pipeline.run(my_atoms)

🏗️ Core Concepts

  • Data Atom: The standard input unit. (Schema/structure explanation)

  • Refiner (BaseRefiner): Cleans, aggregates, or transforms atoms. (e.g., averaging subway data)

  • Linker (BaseLinker): Establishes relationships between nodes.

  • Store (BaseStore): The output driver (JSON, Neo4j, etc.).

🤝 Contributing

We welcome contributions! Please read our CONTRIBUTING.md (추후 추가) file for details on how to submit pull requests.

📜 License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sayou_extractor-0.0.1.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sayou_extractor-0.0.1-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file sayou_extractor-0.0.1.tar.gz.

File metadata

  • Download URL: sayou_extractor-0.0.1.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sayou_extractor-0.0.1.tar.gz
Algorithm Hash digest
SHA256 86e90c4b8111ba1874235f91723f0cbe4bb871f9b40bda98814942185faa22c7
MD5 adafc2dd6f76400cef814af2d7e6d81a
BLAKE2b-256 41b67553da7327c947015b06d88737e5c6a0e9faf99eed72fb8aa01fc969ec18

See more details on using hashes here.

File details

Details for the file sayou_extractor-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for sayou_extractor-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1afcf399c586171d131cdb24754c775ff9ac639d4b1c197236068e7abdc52d44
MD5 8e551f3e4359c142f50b7a916157535e
BLAKE2b-256 4edb37a69325336d979076b651d9d3018636e9575dfee154e24ebafe517ca3a9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page