Python SDK for creating verse-based content sites with AI translations, multimedia (images, audio), semantic search, RAG-grounded Puranic context, and deployment
Project description
Sanatan Verse SDK - Python SDK for Spiritual Verse Collections
Complete toolkit for generating rich multimedia content for spiritual text collections (Hanuman Chalisa, Sundar Kaand, etc.)
Features
- 🔄 Complete Workflow: Generate media and embeddings from canonical sources - all in one command
- 📖 Canonical Sources: Local YAML files ensure text accuracy and quality
- 🎨 AI Images: Generate themed images with DALL-E 3
- 🎵 Audio Pronunciation: Full and slow-speed audio with ElevenLabs
- 🔍 Semantic Search: Vector embeddings for intelligent verse discovery
- 📚 Multi-Collection: Organized support for multiple verse collections
- 🎨 Theme System: Customizable visual styles (modern, traditional, kids-friendly, etc.)
Quick Start
Start here: End-to-End Workflow
Fastest Bootstrap
# Brand new project directory
mkdir my-verse-project
cd my-verse-project
# 1) Create and activate virtualenv
python3 -m venv .venv
source .venv/bin/activate
# 2) Install SDK
pip install sanatan-verse-sdk
# 3) Scaffold project
verse-init --collection hanuman-chalisa
# 4) Initialize git repo (after scaffolding to avoid non-empty prompt)
git init
See full command docs: verse-init
New Project Setup (Recommended)
# 1. Install
pip install sanatan-verse-sdk
# 2. Create project with collection templates
verse-init --project-name my-verse-project --collection hanuman-chalisa
cd my-verse-project
# 3. Configure API keys
cp .env.example .env
# Edit .env and add your API keys from:
# - OpenAI: https://platform.openai.com/api-keys
# - ElevenLabs: https://elevenlabs.io/app/settings/api-keys
# 4. Add canonical Devanagari text
# Edit data/verses/hanuman-chalisa.yaml with actual verse text
# 5. Validate setup
verse-validate
# 6. Generate multimedia content
verse-generate --collection hanuman-chalisa --verse 1
What you get: Verse file, AI-generated image, audio (full + slow speed), and search embeddings!
Existing Project
# Validate and fix structure
verse-validate --fix
# Generate content
verse-generate --collection hanuman-chalisa --verse 15
# Check status
verse-status --collection hanuman-chalisa
Advanced Usage
# Multiple collections at once
verse-init --collection hanuman-chalisa --collection sundar-kaand
# Custom number of sample verses
verse-init --collection my-collection --num-verses 10
# Generate specific components only
verse-generate --collection sundar-kaand --verse 3 --image
verse-generate --collection sundar-kaand --verse 3 --audio
# Skip embeddings update (faster)
verse-generate --collection hanuman-chalisa --verse 15 --no-update-embeddings
What Gets Generated
Each verse generation creates:
- 🎨 Image:
images/{collection}/{theme}/verse-01.png(DALL-E 3) - 🎵 Audio (full):
audio/{collection}/verse-01-full.mp3(ElevenLabs) - 🎵 Audio (slow):
audio/{collection}/verse-01-slow.mp3(0.75x speed) - 🔍 Embeddings:
data/embeddings/collections/{collection}.json+data/embeddings/collections/index.json(for semantic search)
Text Source: Canonical Devanagari text from data/verses/{collection}.yaml (Local Verses Guide)
Migration Note: Legacy combined embeddings (data/embeddings.json) are no longer written by default. Use verse-embeddings --legacy-output if you still need the combined file.
Puranic Context Generation
Enrich verse pages with grounded story references from indexed sacred texts. Two-stage workflow:
Stage 1 — Index a Source Text
verse-index-sources --file data/sources/ananda-ramayana.txt
This command:
- Splits the source text into ~4000-char chunks
- Parses each chunk into discrete named episodes (keywords, type, summary in English + Hindi)
- Generates embeddings for each episode
- Writes outputs:
data/puranic-index/{key}.yml— human-readable episode index with_metasectiondata/embeddings/puranic/{key}.json— embedding vectors for RAG retrievaldata/puranic-references.yml— registry of indexed sources
Only needs to run once per source, or when the source file changes.
# Use Bedrock Cohere for better Sanskrit/Hindi accuracy
verse-index-sources --file data/sources/shiv-puran.txt --provider bedrock-cohere
# If Bedrock input exceeds limits, use truncation policy
verse-embeddings --provider bedrock-cohere --truncate-policy chunk
# Larger chunk size for dense Puranic prose
verse-index-sources --file data/sources/valmiki-ramayana.pdf --chunk-size 6000
Stage 2 — Generate Puranic Context per Verse
verse-puranic-context --collection hanuman-chalisa --all
For each verse this command:
- Embeds the verse text using the same provider as the indexed source
- Runs cosine similarity search across all indexed sources to find the most relevant episodes
- Filters to episodes involving the collection's subject (configured in
_data/collections.yml) - Passes top episodes + verse text to GPT-4o with citation constraints
- Post-validates each entry: drops entries where the subject is not an active participant
- Writes
puranic_context:block into the verse's.mdfrontmatter
# Skip verses that already have context (default)
verse-puranic-context --collection hanuman-chalisa --all
# Regenerate all existing entries
verse-puranic-context --collection hanuman-chalisa --all --regenerate
# Single verse
verse-puranic-context --collection hanuman-chalisa --verse chaupai-06
Collection Subject Configuration
The subject filter is resolved via a two-level hierarchy — no CLI flag needed:
Option A — Project-level default (single-subject projects): set once in _data/verse-config.yml, applies to all collections:
# _data/verse-config.yml
defaults:
subject: Hanuman
subject_type: deity
Option B — Collection-level override: set per collection in _data/collections.yml (takes priority over project default):
# _data/collections.yml
hanuman-chalisa:
subject: Hanuman # overrides or supplements project default
subject_type: deity
krishna-bhajans:
subject: Krishna # different subject for this collection
subject_type: deity
Resolution order: collection-level → project default → error if neither is set and indexed sources exist.
Multiple Sources
Multiple indexed sources are automatically combined in RAG retrieval:
_data/verse-config.yml ← set defaults.subject here
data/sources/
shiv-puran-part1.txt
ananda-ramayana.txt ← add new sources here
data/puranic-index/
shiv-puran-part1.yml ← auto-generated episode index
ananda-ramayana.yml
data/embeddings/
puranic/
shiv-puran-part1.json ← auto-generated embedding vectors
ananda-ramayana.json
See verse-index-sources and verse-puranic-context for full documentation.
Migration Note: Puranic embeddings now live under data/embeddings/puranic/. If you have legacy files in data/embeddings/{source}.json, move them or re-run verse-index-sources to regenerate.
Installation
pip install sanatan-verse-sdk
Commands
Project Setup
- verse-init - Initialize new project with recommended structure
- verse-validate - Validate project structure and configuration
Content Generation
- verse-parse-source - Parse canonical source text into YAML
- verse-generate - Complete orchestrator for verse content (text fetching, multimedia generation, embeddings)
- verse-translate - Translate verses into multiple languages (Hindi, Spanish, French, etc.)
- verse-images - Generate images using DALL-E 3
- verse-audio - Generate audio pronunciations using ElevenLabs
- verse-embeddings - Generate vector embeddings for semantic search (multi-collection guide)
Puranic Context
- verse-index-sources - Index Puranic source texts (PDFs, TXTs) into episodes and embeddings for RAG retrieval
- verse-puranic-context - Generate Puranic context boxes for verses (RAG-grounded or GPT-4o free recall)
Project Management
- verse-add - Add new verse entries to collections (supports multi-chapter formats)
- verse-status - Check status, completion, and validate text against canonical source
- verse-sync - Sync verse text with canonical source (fix mismatches)
- verse-deploy - Deploy Cloudflare Worker for API proxy
Embeddings Config
- embeddings.yml - Shared defaults and precedence (CLI > config > env > defaults)
Configuration
Copy the example environment file and add your API keys:
cp .env.example .env
# Edit .env and add your API keys
See the End-to-End Workflow for the full lifecycle, and the Usage Guide for advanced workflows and best practices.
Documentation
- End-to-End Workflow - Initialize, generate, index, and deploy (full lifecycle)
- Usage Guide - Advanced workflows, batch processing, and best practices
- Local Verses Guide - Using local YAML files for verse text
- Chapter-Based Formats - Multi-chapter collections (Bhagavad Gita, etc.)
- Command Reference - Detailed documentation for all commands
- Development Guide - Setup and contributing to verse-sdk
- Troubleshooting - Common issues and solutions
- Multi-Collection Guide - Working with multiple collections
- Publishing Guide - For maintainers
Example Project
Hanuman GPT - Multi-collection project with Hanuman Chalisa, Sundar Kaand, and Sankat Mochan Hanumanashtak
Requirements
- Python 3.8+
- OpenAI API key (for text/images/embeddings)
- ElevenLabs API key (for audio)
License
MIT License - See LICENSE file for details
Support
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sanatan_verse_sdk-0.102.0.tar.gz.
File metadata
- Download URL: sanatan_verse_sdk-0.102.0.tar.gz
- Upload date:
- Size: 182.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
164a51b15e7187bcd66be3fe27187cae2391d8c09e65bf21d45fea09c8cd3acb
|
|
| MD5 |
a0f98e6ee0b89c93cfb521e3578af442
|
|
| BLAKE2b-256 |
fcd2ac1da68e62c57d610b25aae8ff51ff3e9baf320d20a8d192ee1f6f7ed330
|
File details
Details for the file sanatan_verse_sdk-0.102.0-py3-none-any.whl.
File metadata
- Download URL: sanatan_verse_sdk-0.102.0-py3-none-any.whl
- Upload date:
- Size: 206.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4429a9bc0f21ece9e28f4b1969f1f6650482e347082ff58f9bd1943cf36660b7
|
|
| MD5 |
dc05e1289d0f7237f5267e798749c280
|
|
| BLAKE2b-256 |
dcd377a58d0c6587472e0ffa88d2d85d48a2ced0d0b75929c948237ee2b67ce2
|