Local-first RAG indexer for repos, docs, and PDFs

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

kallegrens

These details have not been verified by PyPI

Project description

🧩🔎 `fragmenter`

Build powerful RAG (Retrieval-Augmented Generation) systems with multiple LLM providers and zero configuration hassle.

✨ Features

🤖 Multiple LLM Providers: OpenAI, Anthropic, Ollama, and HuggingFace support out-of-the-box
🔄 Smart Incremental Updates: Only processes changed files — no wasted computation
📄 Intelligent Parsing: Automatic file-type detection for Markdown, Code, PDF, and more
🎨 Beautiful CLI: Rich formatting with colors and progress indicators
🌐 Web Scraping: Built-in scraper to ingest content from websites
💾 Vector Store Persistence: Save and reload indexes efficiently
🔍 Code Extraction: Automatically extract code blocks from LLM responses
⚙️ Environment-Based Config: Simple .env file configuration
🚀 Zero-Code Usage: CLI tools for complete workflows without writing code
📦 Library Mode: Full programmatic API for custom integrations

📦 Installation

Install as a CLI tool (recommended)

# Install globally as a tool
uv tool install 'fragmenter[openai]'

# Or run instantly without installing
uvx fragmenter init

Add as a project dependency

Install the core package plus the provider(s) you need:

# Pick one (or more) LLM provider extras:
uv add 'fragmenter[openai]'        # OpenAI  (default provider)
uv add 'fragmenter[anthropic]'      # Anthropic
uv add 'fragmenter[ollama]'         # Ollama  (local models)
uv add 'fragmenter[huggingface]'    # HuggingFace

# Or combine several:
uv add 'fragmenter[openai,ollama]'

# Or install everything:
uv add 'fragmenter[all-providers]'

Traditional pip install

pip install 'fragmenter[openai]'

[!NOTE] LLM provider packages are not included in the base install to keep downloads small. If you see an ImportError mentioning a missing extra, install the matching provider extra shown in the error message.

🚀 Quick Start

Prerequisites

Before you begin, ensure you have:

Python: 3.12 or higher ✅
API Keys: For your chosen LLM provider (OpenAI, Anthropic, etc.) 🔑

1. Initialize your project

# Create .env template
fragmenter init

Edit the generated .env file with your API credentials:

# .env
OPENAI_API_KEY=sk-your-actual-key-here
LLM_PROVIDER=openai
LLM_MODEL=gpt-4o-mini
EMBED_PROVIDER=openai
EMBED_MODEL=text-embedding-3-small

[!NOTE] See the Configuration section for all available providers and models.

2. Prepare your data

# Create data directory
mkdir data

# Add your documents (markdown, code, PDFs, etc.)
cp /path/to/your/docs/* ./data/

3. Build the index

fragmenter rebuild-index \
    --data-dir ./data \
    --storage-dir ./vector_store

What happens next? 🎬

📁 Scans your data directory
🔍 Detects file types and applies appropriate parsers
✂️ Chunks documents intelligently
🧮 Generates embeddings
💾 Stores vectors for fast retrieval

4. Query your data

# Ask a question
fragmenter query \
    --storage-dir ./vector_store \
    --query "What is this data about?"

[!TIP] Save responses to files with --output and extract code with --code-only:

fragmenter query \
    -s ./vector_store \
    -q "Write a Python example" \
    -o output.py \
    --code-only \
    --language python

🛠️ CLI Tools

`init`

Create a .env template file in your project.

fragmenter init

`scrape`

Scrape content from websites and save as markdown or HTML.

# Scrape as markdown (default)
fragmenter scrape \
    https://example.com \
    -o ./data

# Scrape as HTML
fragmenter scrape \
    https://example.com \
    -o ./data \
    --format html

`rebuild_index`

Build or update the RAG index with automatic incremental updates.

fragmenter rebuild-index \
    --data-dir ./data \
    --storage-dir ./vector_store

[!NOTE] Incremental updates mean only new or modified files are processed, saving time and compute resources.

`query_index`

Query the index with natural language.

# Basic query
fragmenter query \
    -s ./vector_store \
    -q "Your question here"

# Query from file
fragmenter query \
    -s ./vector_store \
    -f question.txt

# Save output
fragmenter query \
    -s ./vector_store \
    -q "Generate code" \
    -o output.cpp \
    --code-only \
    --language cpp

# Use different provider
fragmenter query \
    -s ./vector_store \
    -q "Explain this" \
    --llm-provider anthropic \
    --llm-model claude-3-5-sonnet-20241022

`inspect_index`

View index statistics and contents.

fragmenter inspect-index \
    -s ./vector_store

⚙️ Configuration

All settings can be configured via environment variables. Create a .env file or set them in your shell.

LLM Providers

Provider	Extra	Configuration
OpenAI	`[openai]`	`LLM_PROVIDER=openai` `LLM_MODEL=gpt-4o-mini`
Anthropic	`[anthropic]`	`LLM_PROVIDER=anthropic` `LLM_MODEL=claude-3-5-sonnet-20241022`
Ollama	`[ollama]`	`LLM_PROVIDER=ollama` `LLM_MODEL=llama3.2`
HuggingFace	`[huggingface]`	`LLM_PROVIDER=huggingface` `LLM_MODEL=meta-llama/Llama-3.2-3B-Instruct`

Embedding Providers

Provider	Configuration
OpenAI	`EMBED_PROVIDER=openai` `EMBED_MODEL=text-embedding-3-small`
HuggingFace	`EMBED_PROVIDER=huggingface` `EMBED_MODEL=BAAI/bge-small-en-v1.5`
Ollama	`EMBED_PROVIDER=ollama` `EMBED_MODEL=nomic-embed-text`

Complete .env Example

# LLM Configuration
OPENAI_API_KEY=sk-your-key-here
LLM_PROVIDER=openai
LLM_MODEL=gpt-4o-mini

# Embedding Configuration
EMBED_PROVIDER=openai
EMBED_MODEL=text-embedding-3-small

# Optional: Anthropic
ANTHROPIC_API_KEY=sk-ant-your-key-here

# Optional: HuggingFace
HUGGINGFACE_TOKEN=hf_your-token-here

[!CAUTION] Never commit your .env file to version control! Add it to .gitignore to protect your API keys.

💻 Using as a Library

If you need custom logic or want to integrate into your own application:

from dotenv import load_dotenv
from fragmenter.config import RAGSettings
from fragmenter.rag.ingestion import build_index
from fragmenter.rag.inference import load_index, query_index

# Load configuration
load_dotenv()
settings = RAGSettings()
settings.configure_llm_settings()

# Build index
build_index(input_dir="./data", persist_dir="./vector_store")

# Query
index = load_index("./vector_store")
response = query_index(index, "Your question")
print(response)

🌱 Usage Examples

Example 1: Documentation RAG

Build a RAG system for your project documentation:

# 1. Scrape your docs site
fragmenter scrape \
    https://docs.example.com \
    -o ./data/docs

# 2. Build the index
fragmenter rebuild-index \
    -d ./data \
    -s ./vector_store

# 3. Query
fragmenter query \
    -s ./vector_store \
    -q "How do I configure authentication?"

Example 2: Code Analysis

Analyze a codebase and generate examples:

# 1. Copy code files to data directory
cp -r /path/to/project/src ./data/

# 2. Build index
fragmenter rebuild-index -d ./data -s ./vector_store

# 3. Generate code examples
fragmenter query \
    -s ./vector_store \
    -q "Show me how to use the authentication module" \
    -o example.py \
    --code-only \
    --language python

Example 3: Research Assistant

Build a research assistant for papers and articles:

# 1. Add PDFs and markdown files to data/
# 2. Build index
fragmenter rebuild-index -d ./data -s ./vector_store

# 3. Query with different providers
fragmenter query \
    -s ./vector_store \
    -q "Summarize the key findings about neural networks" \
    --llm-provider anthropic \
    --llm-model claude-3-5-sonnet-20241022

[!TIP] See examples/waywise for a complete real-world example with custom configuration.

🔧 Troubleshooting

� Missing Provider Errors

[!WARNING] If you see an ImportError like "…requires the 'openai' extra":
uv add 'fragmenter[openai]'   # install the provider you need
See the LLM Providers table for all available extras.

�🔐 Authentication Errors

[!WARNING] If you encounter authentication errors:

✅ Verify your API key is correct and not expired

✅ Check that you've set the correct provider name (openai, not OpenAI)

✅ Ensure API key environment variable names match your provider

✅ Run fragmenter init to generate a fresh .env template

📁 File Parsing Issues

[!NOTE] If certain files aren't being indexed:

Check file extensions are supported (.md, .py, .pdf, .txt, etc.)

Verify files are in the --data-dir path

Use --log-level DEBUG to see detailed parsing information

Check file permissions (files must be readable)

💾 Vector Store Errors

[!TIP] If you see vector store errors:

Delete the ./vector_store directory and rebuild from scratch

Ensure you have write permissions in the storage directory

Check available disk space

Verify embedding model is properly configured

🌐 Provider-Specific Issues

Ollama:

# Ensure Ollama is running
ollama serve

# Pull the model first
ollama pull llama3.2

HuggingFace:

Set HUGGINGFACE_TOKEN for private models
Some models require acceptance of terms on HuggingFace website

🛠️ Development

Setup

git clone https://github.com/RISE-Dependable-Transport-Systems/fragmenter.git
cd fragmenter
uv sync --all-groups

Common Tasks

just lint              # Run all linters via pre-commit
just fmt               # Auto-format code
just test              # Run unit tests
just test-cov          # Run tests with coverage
just build             # Build sdist and wheel
just check-all         # Lint + test
just all               # Full pipeline: clean → install → lint → test → build → verify → install-test

📖 Examples

Complete Real-World Example: See examples/waywise for a full setup with custom data, configuration, and evaluation scripts.
Developer Example: See examples/dev_examples/main.py for a programmatic usage demonstration of the RAG framework.

🙌 Contributing

Contributions welcome! Please ensure:

✅ Code is formatted (just fmt)
✅ All linters pass (just lint)
✅ Tests pass (just test)
✅ New features include tests and documentation
🔒 No API keys or secrets in commits

📄 License

MIT License — see LICENSE file for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

kallegrens

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

Feb 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fragmenter-0.1.1.tar.gz (40.6 kB view details)

Uploaded Feb 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fragmenter-0.1.1-py3-none-any.whl (46.9 kB view details)

Uploaded Feb 17, 2026 Python 3

File details

Details for the file fragmenter-0.1.1.tar.gz.

File metadata

Download URL: fragmenter-0.1.1.tar.gz
Upload date: Feb 17, 2026
Size: 40.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.10.3 {"installer":{"name":"uv","version":"0.10.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for fragmenter-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`30f79d08a69eef1277c4a7fa03f775cb7a6933ba828ff48d35ae737fde159022`
MD5	`ef1ee0414bf792e8353ddd4619f2d1ed`
BLAKE2b-256	`0e10ad820e354dceb1d031906fa2d1dabf9293575334eb1c079814bdf163c26c`

See more details on using hashes here.

File details

Details for the file fragmenter-0.1.1-py3-none-any.whl.

File metadata

Download URL: fragmenter-0.1.1-py3-none-any.whl
Upload date: Feb 17, 2026
Size: 46.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.10.3 {"installer":{"name":"uv","version":"0.10.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for fragmenter-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`39f82cac025c52f0589eeb76b995fd16af2c3a1ed89123544721d6deaeae4b43`
MD5	`f02e68cfc5126443682537681a1fc861`
BLAKE2b-256	`aaef0a997c0963656703246f64c0dac2840d5839fec5c59fb5050c44baaa43b1`

See more details on using hashes here.

fragmenter 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

🧩🔎 fragmenter

✨ Features

📦 Installation

Install as a CLI tool (recommended)

Add as a project dependency

Traditional pip install

🚀 Quick Start

Prerequisites

1. Initialize your project

2. Prepare your data

3. Build the index

4. Query your data

🛠️ CLI Tools

init

scrape

rebuild_index

query_index

inspect_index

⚙️ Configuration

LLM Providers

Embedding Providers

Complete .env Example

💻 Using as a Library

🌱 Usage Examples

Example 1: Documentation RAG

Example 2: Code Analysis

Example 3: Research Assistant

🔧 Troubleshooting

� Missing Provider Errors

�🔐 Authentication Errors

📁 File Parsing Issues

💾 Vector Store Errors

🌐 Provider-Specific Issues

🛠️ Development

Setup

Common Tasks

📖 Examples

🙌 Contributing

📄 License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

🧩🔎 `fragmenter`

`init`

`scrape`

`rebuild_index`

`query_index`

`inspect_index`