Retrieval-Augmented Generation Driven by Offline Local LLMs — a fully-local RAG system for JIRA tickets, code, and PDFs, powered by Ollama.

Project description

🧶 Ragdoll

Retrieval-Augmented Generation Driven by Offline Local LLMs

A fully-local RAG system that ingests JIRA tickets, PDF documents, and Python source code, indexes them for semantic search, and connects to a local LLM via Ollama for interactive Q&A, summarization, and chat.

Privacy-first: All data stays on your machine — nothing is sent to external services.

Prerequisites

Python 3.12+
Ollama running locally with:
- An embedding model (e.g. nomic-embed-text)
- A chat model (e.g. gpt-oss:20b, deepseek-r1:32b)
pixi for environment management

Quick Start

# Clone and enter the project
cd ragdoll

# Install with pixi (creates isolated env + editable install)
pixi install

# Set up user-level configuration
mkdir -p ~/.ragdoll && chmod 700 ~/.ragdoll
cat > ~/.ragdoll/config.toml << 'EOF'
jira_url = "https://your-jira.example.com"
jira_user = "your.user"
jira_token = "YOUR_PAT_TOKEN"
jira_auth_method = "pat"  # "pat" for JIRA Data Center, "basic" for Cloud
EOF
chmod 600 ~/.ragdoll/config.toml

# Check everything is connected
pixi run ragdoll status

Usage

Ingest Data

# Ingest PDF files or directories
pixi run ragdoll ingest pdf ./docs/technical_handbook.pdf
pixi run ragdoll ingest pdf ./reports/

# Ingest JIRA issues via JQL
pixi run ragdoll ingest jira --jql "project = CAS AND updated >= -30d"
pixi run ragdoll ingest jira --jql "project = PIPE AND updated >= -60d" --max-results 100

# Ingest from a different JIRA instance (multi-site)
pixi run ragdoll ingest jira \
  --url https://other-jira.example.com \
  --token OTHER_PAT \
  --jql "project = EXT AND updated >= -30d"

# Ingest Python source code (AST-parsed per function/class)
pixi run ragdoll ingest code ./src/
pixi run ragdoll ingest code ./path/to/project/

Reingesting Data (LlamaIndex Update)

If you are upgrading from an older version of ragdoll to the LlamaIndex-backed version, your existing ChromaDB data is fully backward compatible. However, it is highly recommended to wipe the old index and reingest your data to take advantage of LlamaIndex's superior semantic chunking (which splits by sentences instead of fixed character limits).

To clear your database and start fresh:

# Delete the old ChromaDB collection
rm -rf ~/.ragdoll/data/chroma

# Re-run your ingestion commands
pixi run ragdoll ingest jira --jql "project = CAS AND updated >= -30d"
pixi run ragdoll ingest pdf ./docs/

Search

# Semantic search across all ingested data
pixi run ragdoll search "tclean performance regression"
pixi run ragdoll search "AsdmStMan lazy import" --source jira
pixi run ragdoll search "calibration pipeline" --source pdf -n 5
pixi run ragdoll search "embedding function" --source code

Summarize

# Summarize a topic from ingested data
pixi run ragdoll summarize "What are the known issues with AsdmStMan?"
pixi run ragdoll summarize "tclean parallelization" --source jira

Interactive Chat

# Start an interactive RAG chat session
pixi run ragdoll chat
pixi run ragdoll chat --source jira  # only use JIRA context
pixi run ragdoll chat --source code  # only use source code context

Chat features:

Persistent history — arrow-up recalls previous questions across sessions (stored in ~/.ragdoll/chat_history)
Line editing — full readline support (backspace, arrows, Home/End)
Multi-turn — context accumulates within a session

Configuration

Ragdoll uses a 4-layer precedence configuration strategy:

Priority	Source	Purpose
1 (highest)	`RAGDOLL_*` environment variables	CI/ephemeral overrides
2a	`./ragdoll.toml` in the project directory	Project-level settings
2b	`./.env` in the project directory	Project-level secrets
3	`~/.ragdoll/config.toml`	User-level defaults & credentials
4 (lowest)	Package defaults	Hardcoded fallbacks

Settings Reference

Variable / TOML key	Default	Description
`jira_url`	—	JIRA server URL
`jira_user`	—	JIRA username
`jira_token`	—	JIRA API token or PAT
`jira_auth_method`	`pat`	`"pat"` for Data Center, `"basic"` for Cloud
`jira_batch_size`	`50`	Issues per API request
`ollama_host`	`http://localhost:11434`	Ollama API endpoint
`embed_model`	`nomic-embed-text`	Embedding model
`chat_model`	`gpt-oss:20b`	Chat / generation model
`temperature`	`0.3`	LLM sampling temperature
`data_dir`	`~/.ragdoll/data`	ChromaDB storage directory
`collection_name`	`ragdoll`	ChromaDB collection name
`chunk_size`	`1000`	Characters per chunk
`chunk_overlap`	`200`	Overlap between consecutive chunks
`top_k`	`20`	Default retrieval count

Architecture

Source Data                Pipeline                        Storage
───────────                ────────                        ───────
PDF files     ─┐
JIRA tickets  ─┼─→  Ingestor  →  Chunker  →  Embedder  →  ChromaDB
Python code   ─┘              (AST-aware)   (Ollama)       (local)
                                                             ↑
Query Flow                                                   │
──────────                                                   │
CLI / Chat  →  Embed query  →  Retriever  ←──────────────────┘
                                 ↓
                           LLM (Ollama)  →  Streamed answer

Data Sources

Source	Module	Strategy
PDF	`ragdoll.ingest.pdf`	PyMuPDF text extraction → recursive character splitter
JIRA	`ragdoll.ingest.jira`	REST API with JQL → structured text per issue
Code	`ragdoll.ingest.code`	AST parsing → one Document per function/class/module docstring

Key Components

Config (ragdoll.config) — Pydantic Settings with 4-layer precedence
Chunker (ragdoll.ingest.chunker) — Recursive character text splitter
Embedder (ragdoll.llm.ollama) — Ollama HTTP client for embeddings and generation
Vector Store (ragdoll.store.vectordb) — ChromaDB with cosine similarity
Retriever (ragdoll.query.retriever) — Semantic search with source filtering
RAG Chain (ragdoll.query.rag) — Context-augmented generation and chat
CLI (ragdoll.cli) — Click-based interface with Rich formatting

Documentation

Full documentation is available under docs/ and can be built with Sphinx:

pixi run docs

License

MIT

Project details

Release history Release notifications | RSS feed

This version

0.2.0

May 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragdoll_ai-0.2.0.tar.gz (24.6 kB view details)

Uploaded May 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ragdoll_ai-0.2.0-py3-none-any.whl (25.7 kB view details)

Uploaded May 20, 2026 Python 3

File details

Details for the file ragdoll_ai-0.2.0.tar.gz.

File metadata

Download URL: ragdoll_ai-0.2.0.tar.gz
Upload date: May 20, 2026
Size: 24.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ragdoll_ai-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`fd3405e5156314e9f26dade78e1459fb2a545986b50939a9c113a4f6eed86d48`
MD5	`8d7d3671713de4c292a30b50022d6b14`
BLAKE2b-256	`2db7980f602b6cc20a20ba2e0eff97e753e83d604cb30f7e208ea13885c5bfc0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragdoll_ai-0.2.0.tar.gz:

Publisher: publish.yml on r-xue/ragdoll

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ragdoll_ai-0.2.0.tar.gz
- Subject digest: fd3405e5156314e9f26dade78e1459fb2a545986b50939a9c113a4f6eed86d48
- Sigstore transparency entry: 1586375166
- Sigstore integration time: May 20, 2026
Source repository:
- Permalink: r-xue/ragdoll@7544a7fe3ae99c9713a43c18dfd45032885b3b86
- Branch / Tag: refs/heads/main
- Owner: https://github.com/r-xue
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@7544a7fe3ae99c9713a43c18dfd45032885b3b86
- Trigger Event: workflow_dispatch

File details

Details for the file ragdoll_ai-0.2.0-py3-none-any.whl.

File metadata

Download URL: ragdoll_ai-0.2.0-py3-none-any.whl
Upload date: May 20, 2026
Size: 25.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ragdoll_ai-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0eef85a1636eebbe15c40a73a089177414350e4b536b3f4e38fb079291d6db45`
MD5	`d765703ace4430f5e45f7fdfa9deb4bc`
BLAKE2b-256	`6645b1f75fce2ceaf17c2dc82ce40684385515638a829331952d17aaa21dd228`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragdoll_ai-0.2.0-py3-none-any.whl:

Publisher: publish.yml on r-xue/ragdoll

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ragdoll_ai-0.2.0-py3-none-any.whl
- Subject digest: 0eef85a1636eebbe15c40a73a089177414350e4b536b3f4e38fb079291d6db45
- Sigstore transparency entry: 1586375203
- Sigstore integration time: May 20, 2026
Source repository:
- Permalink: r-xue/ragdoll@7544a7fe3ae99c9713a43c18dfd45032885b3b86
- Branch / Tag: refs/heads/main
- Owner: https://github.com/r-xue
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@7544a7fe3ae99c9713a43c18dfd45032885b3b86
- Trigger Event: workflow_dispatch

ragdoll-ai 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

🧶 Ragdoll

Prerequisites

Quick Start

Usage

Ingest Data

Reingesting Data (LlamaIndex Update)

Search

Summarize

Interactive Chat

Configuration

Settings Reference

Architecture

Data Sources

Key Components

Documentation

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance