Retrieval-Augmented Generation Driven by Offline Local LLMs โ a fully-local RAG system for JIRA tickets, code, and PDFs, powered by Ollama.
Project description
๐งถ Ragdoll
Retrieval-Augmented Generation Driven by Offline Local LLMs
A fully-local RAG system that ingests JIRA tickets, PDF documents, and Python source code, indexes them for semantic search, and connects to a local LLM via Ollama for interactive Q&A, summarization, and chat.
Privacy-first: All data stays on your machine โ nothing is sent to external services.
Prerequisites
- Python 3.12+
- Ollama running locally with:
- An embedding model (e.g.
nomic-embed-text) - A chat model (e.g.
gpt-oss:20b,deepseek-r1:32b)
- An embedding model (e.g.
- pixi for environment management
Quick Start
# Clone and enter the project
cd ragdoll
# Install with pixi (creates isolated env + editable install)
pixi install
# Set up user-level configuration
mkdir -p ~/.ragdoll && chmod 700 ~/.ragdoll
cat > ~/.ragdoll/config.toml << 'EOF'
jira_url = "https://your-jira.example.com"
jira_user = "your.user"
jira_token = "YOUR_PAT_TOKEN"
jira_auth_method = "pat" # "pat" for JIRA Data Center, "basic" for Cloud
EOF
chmod 600 ~/.ragdoll/config.toml
# Check everything is connected
pixi run ragdoll status
Usage
Ingest Data
# Ingest PDF files or directories
pixi run ragdoll ingest pdf ./docs/technical_handbook.pdf
pixi run ragdoll ingest pdf ./reports/
# Ingest JIRA issues via JQL
pixi run ragdoll ingest jira --jql "project = CAS AND updated >= -30d"
pixi run ragdoll ingest jira --jql "project = PIPE AND updated >= -60d" --max-results 100
# Ingest from a different JIRA instance (multi-site)
pixi run ragdoll ingest jira \
--url https://other-jira.example.com \
--token OTHER_PAT \
--jql "project = EXT AND updated >= -30d"
# Ingest Python source code (AST-parsed per function/class)
pixi run ragdoll ingest code ./src/
pixi run ragdoll ingest code ./path/to/project/
Reingesting Data (LlamaIndex Update)
If you are upgrading from an older version of ragdoll to the LlamaIndex-backed version, your existing ChromaDB data is fully backward compatible. However, it is highly recommended to wipe the old index and reingest your data to take advantage of LlamaIndex's superior semantic chunking (which splits by sentences instead of fixed character limits).
To clear your database and start fresh:
# Delete the old ChromaDB collection
rm -rf ~/.ragdoll/data/chroma
# Re-run your ingestion commands
pixi run ragdoll ingest jira --jql "project = CAS AND updated >= -30d"
pixi run ragdoll ingest pdf ./docs/
Search
# Semantic search across all ingested data
pixi run ragdoll search "tclean performance regression"
pixi run ragdoll search "AsdmStMan lazy import" --source jira
pixi run ragdoll search "calibration pipeline" --source pdf -n 5
pixi run ragdoll search "embedding function" --source code
Summarize
# Summarize a topic from ingested data
pixi run ragdoll summarize "What are the known issues with AsdmStMan?"
pixi run ragdoll summarize "tclean parallelization" --source jira
Interactive Chat
# Start an interactive RAG chat session
pixi run ragdoll chat
pixi run ragdoll chat --source jira # only use JIRA context
pixi run ragdoll chat --source code # only use source code context
Chat features:
- Persistent history โ arrow-up recalls previous questions across sessions
(stored in
~/.ragdoll/chat_history) - Line editing โ full readline support (backspace, arrows, Home/End)
- Multi-turn โ context accumulates within a session
Configuration
Ragdoll uses a 4-layer precedence configuration strategy:
| Priority | Source | Purpose |
|---|---|---|
| 1 (highest) | RAGDOLL_* environment variables |
CI/ephemeral overrides |
| 2a | ./ragdoll.toml in the project directory |
Project-level settings |
| 2b | ./.env in the project directory |
Project-level secrets |
| 3 | ~/.ragdoll/config.toml |
User-level defaults & credentials |
| 4 (lowest) | Package defaults | Hardcoded fallbacks |
Settings Reference
| Variable / TOML key | Default | Description |
|---|---|---|
jira_url |
โ | JIRA server URL |
jira_user |
โ | JIRA username |
jira_token |
โ | JIRA API token or PAT |
jira_auth_method |
pat |
"pat" for Data Center, "basic" for Cloud |
jira_batch_size |
50 |
Issues per API request |
ollama_host |
http://localhost:11434 |
Ollama API endpoint |
embed_model |
nomic-embed-text |
Embedding model |
chat_model |
gpt-oss:20b |
Chat / generation model |
temperature |
0.3 |
LLM sampling temperature |
data_dir |
~/.ragdoll/data |
ChromaDB storage directory |
collection_name |
ragdoll |
ChromaDB collection name |
chunk_size |
1000 |
Characters per chunk |
chunk_overlap |
200 |
Overlap between consecutive chunks |
top_k |
20 |
Default retrieval count |
Architecture
Source Data Pipeline Storage
โโโโโโโโโโโ โโโโโโโโ โโโโโโโ
PDF files โโ
JIRA tickets โโผโโ Ingestor โ Chunker โ Embedder โ ChromaDB
Python code โโ (AST-aware) (Ollama) (local)
โ
Query Flow โ
โโโโโโโโโโ โ
CLI / Chat โ Embed query โ Retriever โโโโโโโโโโโโโโโโโโโโ
โ
LLM (Ollama) โ Streamed answer
Data Sources
| Source | Module | Strategy |
|---|---|---|
ragdoll.ingest.pdf |
PyMuPDF text extraction โ recursive character splitter | |
| JIRA | ragdoll.ingest.jira |
REST API with JQL โ structured text per issue |
| Code | ragdoll.ingest.code |
AST parsing โ one Document per function/class/module docstring |
Key Components
- Config (
ragdoll.config) โ Pydantic Settings with 4-layer precedence - Chunker (
ragdoll.ingest.chunker) โ Recursive character text splitter - Embedder (
ragdoll.llm.ollama) โ Ollama HTTP client for embeddings and generation - Vector Store (
ragdoll.store.vectordb) โ ChromaDB with cosine similarity - Retriever (
ragdoll.query.retriever) โ Semantic search with source filtering - RAG Chain (
ragdoll.query.rag) โ Context-augmented generation and chat - CLI (
ragdoll.cli) โ Click-based interface with Rich formatting
Documentation
Full documentation is available under docs/ and can be built with Sphinx:
pixi run docs
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ragdoll_ai-0.2.0.tar.gz.
File metadata
- Download URL: ragdoll_ai-0.2.0.tar.gz
- Upload date:
- Size: 24.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd3405e5156314e9f26dade78e1459fb2a545986b50939a9c113a4f6eed86d48
|
|
| MD5 |
8d7d3671713de4c292a30b50022d6b14
|
|
| BLAKE2b-256 |
2db7980f602b6cc20a20ba2e0eff97e753e83d604cb30f7e208ea13885c5bfc0
|
Provenance
The following attestation bundles were made for ragdoll_ai-0.2.0.tar.gz:
Publisher:
publish.yml on r-xue/ragdoll
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ragdoll_ai-0.2.0.tar.gz -
Subject digest:
fd3405e5156314e9f26dade78e1459fb2a545986b50939a9c113a4f6eed86d48 - Sigstore transparency entry: 1586375166
- Sigstore integration time:
-
Permalink:
r-xue/ragdoll@7544a7fe3ae99c9713a43c18dfd45032885b3b86 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/r-xue
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7544a7fe3ae99c9713a43c18dfd45032885b3b86 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file ragdoll_ai-0.2.0-py3-none-any.whl.
File metadata
- Download URL: ragdoll_ai-0.2.0-py3-none-any.whl
- Upload date:
- Size: 25.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0eef85a1636eebbe15c40a73a089177414350e4b536b3f4e38fb079291d6db45
|
|
| MD5 |
d765703ace4430f5e45f7fdfa9deb4bc
|
|
| BLAKE2b-256 |
6645b1f75fce2ceaf17c2dc82ce40684385515638a829331952d17aaa21dd228
|
Provenance
The following attestation bundles were made for ragdoll_ai-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on r-xue/ragdoll
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ragdoll_ai-0.2.0-py3-none-any.whl -
Subject digest:
0eef85a1636eebbe15c40a73a089177414350e4b536b3f4e38fb079291d6db45 - Sigstore transparency entry: 1586375203
- Sigstore integration time:
-
Permalink:
r-xue/ragdoll@7544a7fe3ae99c9713a43c18dfd45032885b3b86 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/r-xue
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7544a7fe3ae99c9713a43c18dfd45032885b3b86 -
Trigger Event:
workflow_dispatch
-
Statement type: