Skip to main content

Retrieval-Augmented Generation Driven by Offline Local LLMs โ€” a fully-local RAG system for JIRA tickets, code, and PDFs, powered by Ollama.

Project description

๐Ÿงถ Ragdoll

Retrieval-Augmented Generation Driven by Offline Local LLMs

A fully-local RAG system that ingests JIRA tickets, PDF documents, and Python source code, indexes them for semantic search, and connects to a local LLM via Ollama for interactive Q&A, summarization, and chat.

Privacy-first: All data stays on your machine โ€” nothing is sent to external services.

Prerequisites

  • Python 3.12+
  • Ollama running locally with:
    • An embedding model (e.g. nomic-embed-text)
    • A chat model (e.g. gpt-oss:20b, deepseek-r1:32b)
  • pixi for environment management

Quick Start

# Clone and enter the project
cd ragdoll

# Install with pixi (creates isolated env + editable install)
pixi install

# Set up user-level configuration
mkdir -p ~/.ragdoll && chmod 700 ~/.ragdoll
cat > ~/.ragdoll/config.toml << 'EOF'
jira_url = "https://your-jira.example.com"
jira_user = "your.user"
jira_token = "YOUR_PAT_TOKEN"
jira_auth_method = "pat"  # "pat" for JIRA Data Center, "basic" for Cloud
EOF
chmod 600 ~/.ragdoll/config.toml

# Check everything is connected
pixi run ragdoll status

Usage

Ingest Data

# Ingest PDF files or directories
pixi run ragdoll ingest pdf ./docs/technical_handbook.pdf
pixi run ragdoll ingest pdf ./reports/

# Ingest JIRA issues via JQL
pixi run ragdoll ingest jira --jql "project = CAS AND updated >= -30d"
pixi run ragdoll ingest jira --jql "project = PIPE AND updated >= -60d" --max-results 100

# Ingest from a different JIRA instance (multi-site)
pixi run ragdoll ingest jira \
  --url https://other-jira.example.com \
  --token OTHER_PAT \
  --jql "project = EXT AND updated >= -30d"

# Ingest Python source code (AST-parsed per function/class)
pixi run ragdoll ingest code ./src/
pixi run ragdoll ingest code ./path/to/project/

Reingesting Data (LlamaIndex Update)

If you are upgrading from an older version of ragdoll to the LlamaIndex-backed version, your existing ChromaDB data is fully backward compatible. However, it is highly recommended to wipe the old index and reingest your data to take advantage of LlamaIndex's superior semantic chunking (which splits by sentences instead of fixed character limits).

To clear your database and start fresh:

# Delete the old ChromaDB collection
rm -rf ~/.ragdoll/data/chroma

# Re-run your ingestion commands
pixi run ragdoll ingest jira --jql "project = CAS AND updated >= -30d"
pixi run ragdoll ingest pdf ./docs/

Search

# Semantic search across all ingested data
pixi run ragdoll search "tclean performance regression"
pixi run ragdoll search "AsdmStMan lazy import" --source jira
pixi run ragdoll search "calibration pipeline" --source pdf -n 5
pixi run ragdoll search "embedding function" --source code

Summarize

# Summarize a topic from ingested data
pixi run ragdoll summarize "What are the known issues with AsdmStMan?"
pixi run ragdoll summarize "tclean parallelization" --source jira

Interactive Chat

# Start an interactive RAG chat session
pixi run ragdoll chat
pixi run ragdoll chat --source jira  # only use JIRA context
pixi run ragdoll chat --source code  # only use source code context

Chat features:

  • Persistent history โ€” arrow-up recalls previous questions across sessions (stored in ~/.ragdoll/chat_history)
  • Line editing โ€” full readline support (backspace, arrows, Home/End)
  • Multi-turn โ€” context accumulates within a session

Configuration

Ragdoll uses a 4-layer precedence configuration strategy:

Priority Source Purpose
1 (highest) RAGDOLL_* environment variables CI/ephemeral overrides
2a ./ragdoll.toml in the project directory Project-level settings
2b ./.env in the project directory Project-level secrets
3 ~/.ragdoll/config.toml User-level defaults & credentials
4 (lowest) Package defaults Hardcoded fallbacks

Settings Reference

Variable / TOML key Default Description
jira_url โ€” JIRA server URL
jira_user โ€” JIRA username
jira_token โ€” JIRA API token or PAT
jira_auth_method pat "pat" for Data Center, "basic" for Cloud
jira_batch_size 50 Issues per API request
ollama_host http://localhost:11434 Ollama API endpoint
embed_model nomic-embed-text Embedding model
chat_model gpt-oss:20b Chat / generation model
temperature 0.3 LLM sampling temperature
data_dir ~/.ragdoll/data ChromaDB storage directory
collection_name ragdoll ChromaDB collection name
chunk_size 1000 Characters per chunk
chunk_overlap 200 Overlap between consecutive chunks
top_k 20 Default retrieval count

Architecture

Source Data                Pipeline                        Storage
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                        โ”€โ”€โ”€โ”€โ”€โ”€โ”€
PDF files     โ”€โ”
JIRA tickets  โ”€โ”ผโ”€โ†’  Ingestor  โ†’  Chunker  โ†’  Embedder  โ†’  ChromaDB
Python code   โ”€โ”˜              (AST-aware)   (Ollama)       (local)
                                                             โ†‘
Query Flow                                                   โ”‚
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                                   โ”‚
CLI / Chat  โ†’  Embed query  โ†’  Retriever  โ†โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                 โ†“
                           LLM (Ollama)  โ†’  Streamed answer

Data Sources

Source Module Strategy
PDF ragdoll.ingest.pdf PyMuPDF text extraction โ†’ recursive character splitter
JIRA ragdoll.ingest.jira REST API with JQL โ†’ structured text per issue
Code ragdoll.ingest.code AST parsing โ†’ one Document per function/class/module docstring

Key Components

  • Config (ragdoll.config) โ€” Pydantic Settings with 4-layer precedence
  • Chunker (ragdoll.ingest.chunker) โ€” Recursive character text splitter
  • Embedder (ragdoll.llm.ollama) โ€” Ollama HTTP client for embeddings and generation
  • Vector Store (ragdoll.store.vectordb) โ€” ChromaDB with cosine similarity
  • Retriever (ragdoll.query.retriever) โ€” Semantic search with source filtering
  • RAG Chain (ragdoll.query.rag) โ€” Context-augmented generation and chat
  • CLI (ragdoll.cli) โ€” Click-based interface with Rich formatting

Documentation

Full documentation is available under docs/ and can be built with Sphinx:

pixi run docs

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragdoll_ai-0.2.0.tar.gz (24.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ragdoll_ai-0.2.0-py3-none-any.whl (25.7 kB view details)

Uploaded Python 3

File details

Details for the file ragdoll_ai-0.2.0.tar.gz.

File metadata

  • Download URL: ragdoll_ai-0.2.0.tar.gz
  • Upload date:
  • Size: 24.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ragdoll_ai-0.2.0.tar.gz
Algorithm Hash digest
SHA256 fd3405e5156314e9f26dade78e1459fb2a545986b50939a9c113a4f6eed86d48
MD5 8d7d3671713de4c292a30b50022d6b14
BLAKE2b-256 2db7980f602b6cc20a20ba2e0eff97e753e83d604cb30f7e208ea13885c5bfc0

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragdoll_ai-0.2.0.tar.gz:

Publisher: publish.yml on r-xue/ragdoll

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ragdoll_ai-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: ragdoll_ai-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 25.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ragdoll_ai-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0eef85a1636eebbe15c40a73a089177414350e4b536b3f4e38fb079291d6db45
MD5 d765703ace4430f5e45f7fdfa9deb4bc
BLAKE2b-256 6645b1f75fce2ceaf17c2dc82ce40684385515638a829331952d17aaa21dd228

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragdoll_ai-0.2.0-py3-none-any.whl:

Publisher: publish.yml on r-xue/ragdoll

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page