Ad-hoc Dagster pipelines for data fetching using AI/LLM prompts and agentic AI

These details have not been verified by PyPI

Project links

Project description

Khora

Ad-hoc Dagster pipelines for data fetching using AI/LLM prompts and agentic AI.

Overview

Khora is a Python package that enables the creation of dynamic data pipelines using Dagster, powered by AI agents built with LangGraph and LangChain. It can fetch data from various sources including:

APIs (REST endpoints with full HTTP method support)
Websites (advanced web scraping using Playwright - handles JavaScript, takes screenshots, executes custom scripts)
Google Docs/Sheets (with service account authentication)

Features

🤖 AI-powered data fetching using natural language prompts
🔄 Dynamic pipeline generation based on descriptions
🛠️ Support for multiple data sources:
- APIs (REST endpoints)
- Web scraping with Playwright (handles JavaScript-rendered content)
- Google Docs and Sheets
🎭 Advanced web scraping capabilities:
- JavaScript execution
- Screenshot capture
- Custom selectors
- Wait conditions
📊 Integration with Dagster for orchestration
🐳 Docker support for easy deployment
✅ Comprehensive test coverage

Installation

Using uv (recommended)

uv pip install khora

Using pip

pip install khora

Development Installation

git clone https://github.com/yourusername/khora.git
cd khora
uv pip install -e ".[dev]"

Configuration

Copy the environment template:

cp .env.example .env

Edit .env and add your credentials:

OPENAI_API_KEY: Your OpenAI API key
GOOGLE_CREDENTIALS_PATH: Path to Google service account credentials (for Google Docs/Sheets)

Usage

Basic Example

from khora.agents import DataFetcherAgent, PipelineBuilderAgent
from khora.utils.data_models import DataRequest, DataSourceType

# Initialize agents
fetcher = DataFetcherAgent(openai_api_key="your-key")
builder = PipelineBuilderAgent(openai_api_key="your-key")

# Create a data request
request = DataRequest(
    source_type=DataSourceType.API,
    prompt="Fetch current weather data for San Francisco",
    source_config={
        "url": "https://api.weather.com/v1/current"
    }
)

# Fetch data
response = await fetcher.fetch_data(request)
print(response.data)

Creating Dynamic Pipelines

# Describe your pipeline in natural language
description = """
Create a pipeline that:
1. Fetches cryptocurrency prices from CoinGecko API
2. Scrapes latest crypto news from CoinDesk
3. Reads analysis from a Google Sheet
"""

# Generate pipeline configuration
config = builder.analyze_pipeline_request(description)

# Build and execute the pipeline
pipeline = builder.build_pipeline(config)

Running Dagster UI

dagster dev -f src/khora/pipelines/definitions.py

Then navigate to http://localhost:3000 to see the Dagster UI.

Docker Usage

Build the image

docker build -t khora:latest .

Run the container

docker run -p 3000:3000 \
  -e OPENAI_API_KEY=your-key \
  -v $(pwd)/.env:/app/.env \
  khora:latest

Testing

Run the test suite:

pytest tests/

With coverage:

pytest tests/ --cov=khora --cov-report=html

Requirements

Python 3.12 (required)
Playwright browsers (automatically installed during setup)

CI/CD

The project uses GitHub Actions for CI/CD with two main workflows:

Main CI Workflow (`ci.yml`)

Runs tests on Python 3.12
Checks code formatting with Black and Ruff
Performs type checking with mypy
Builds and tests the Docker image
Uploads coverage reports to Codecov

Publish Workflow (`publish.yml`)

Automatically publishes to PyPI when version tags are pushed:

Triggered by pushing tags matching v* pattern (e.g., v0.0.2)
Runs full test suite and quality checks
Builds and publishes package to PyPI
Uses PYPI_API_TOKEN secret for authentication

Project Structure

khora/
├── src/khora/
│   ├── agents/         # AI agents for data fetching and pipeline building
│   ├── pipelines/      # Dagster pipeline definitions
│   ├── tools/          # Tools for different data sources
│   └── utils/          # Utilities and data models
├── tests/              # Test suite
├── .github/workflows/  # CI/CD configuration
├── Dockerfile          # Container definition
└── pyproject.toml      # Project configuration

Contributing

Fork the repository
Create a feature branch: git checkout -b feature-name
Make your changes and add tests
Run tests and linting: pytest && black . && ruff check .
Commit your changes: git commit -m "Add feature"
Push to your fork: git push origin feature-name
Create a pull request

License

MIT License - see LICENSE file for details.

Support

For issues and questions:

Open an issue on GitHub
Check the documentation
Review existing discussions

Roadmap

Add support for more data sources (databases, S3, etc.)
Implement data transformation capabilities
Add scheduling and monitoring features
Create a web UI for pipeline management
Support for more LLM providers

Releasing

Quick Release (Recommended)

Use the automated release script:

# Create and push a patch release (0.0.1 -> 0.0.2)
python scripts/create_release.py patch --push

# Create a minor release (0.0.1 -> 0.1.0)
python scripts/create_release.py minor

# Create a major release (0.0.1 -> 1.0.0)
python scripts/create_release.py major

# Preview what would happen
python scripts/create_release.py patch --dry-run

Step-by-Step Release

Bump version:
```
python scripts/bump_version.py patch
```

Create git tag and push:

git add .
git commit -m "Bump version to 0.0.2"
git tag v0.0.2
git push origin main --tags

Automatic publishing: The publish workflow will automatically:
- Run all tests and quality checks
- Build the package
- Publish to PyPI

Setup PyPI Token

To enable publishing, add your PyPI API token as a GitHub secret:

Create an API token on PyPI
Add it as PYPI_API_TOKEN in your repository secrets

Version

Current version: 0.0.1

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.1

Jun 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

khora-0.0.1.tar.gz (108.8 kB view details)

Uploaded Jun 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

khora-0.0.1-py3-none-any.whl (20.4 kB view details)

Uploaded Jun 29, 2025 Python 3

File details

Details for the file khora-0.0.1.tar.gz.

File metadata

Download URL: khora-0.0.1.tar.gz
Upload date: Jun 29, 2025
Size: 108.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for khora-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`6ed512cbeae83aa29f53a88734cb9a76d66a4bf7a1e33630ab6a4aad29e5c0b5`
MD5	`668c705db552be71172a8934a38461ee`
BLAKE2b-256	`ec3069278465ccba91d560713074b9926b0650a679ca73417858edfa336679bf`

See more details on using hashes here.

File details

Details for the file khora-0.0.1-py3-none-any.whl.

File metadata

Download URL: khora-0.0.1-py3-none-any.whl
Upload date: Jun 29, 2025
Size: 20.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for khora-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d2ca01c97352d48ee21eb1e2c981d1986f0d0c446d84e2a928d568de9573c781`
MD5	`9f939fd4e37fe547471d9c274cb16180`
BLAKE2b-256	`0c86bef330658238390a631992642d87960cdf6dd903225a69682ca937913617`

See more details on using hashes here.

khora 0.0.1

Navigation

Verified details

Owner

Unverified details

Project links

Meta

Classifiers

Project description

Khora

Overview

Features

Installation

Using uv (recommended)

Using pip

Development Installation

Configuration

Usage

Basic Example

Creating Dynamic Pipelines

Running Dagster UI

Docker Usage

Build the image

Run the container

Testing

Requirements

CI/CD

Main CI Workflow (ci.yml)

Publish Workflow (publish.yml)

Project Structure

Contributing

License

Support

Roadmap

Releasing

Quick Release (Recommended)

Step-by-Step Release

Setup PyPI Token

Version

Project details

Verified details

Owner

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Main CI Workflow (`ci.yml`)

Publish Workflow (`publish.yml`)