ragstruct - A Pseudo-Finetuning RAG Framework for structured JSON-based retrieval
Project description
๐ต ragstruct โ A Pseudo-Finetuning RAG Framework
๐ Lightweight semantic retriever using structured JSON ๐ก Built with love by Joshikaran K. (Joshi Felix)
๐ง What is ragstruct?
ragstruct is a minimal, blazing-fast semantic retrieval library built for anyone who wants to simulate fine-tuned behavior without ever training a model. You give it structured JSON memory, and it gives your LLM meaningful context, fast.
No vector DBs. No finetuning. No heavy dependencies.
โจ Key Features
- โ Zero-database JSON-based retriever
- ๐ฏ Built on BGE embeddings (BAAI/bge-large-en-v1.5)
- โณ Fast top-k semantic matches
- ๐ Memory tracking for context injection
- ๐ช Works with any LLM (OpenAI, Mistral, local)
- ๐ Great for agents, personal AIs, digital twins
๐ Installation
pip install ragstruct .
๐ Use Case Examples
- ๐ค Digital Twin memory retrieval (e.g., Joshi AI)
- ๐งโ๐ผ Resume bots and personal agent context
- ๐ง Mental health / therapy state tracking
- ๐ LLM Study-buddy with syllabus JSON
- ๐ Retrieval-based storytelling agents
- ๐ฎ Game character memory/NPCs
๐ก Why ragstruct Exists
I (Joshikaran) built ragstruct while creating Joshi AI, a digital twin that could talk like me, remember my projects, reflect my mindset.
Every existing RAG pipeline felt like overkill. LangChain + vector DB + server just to search my own memory? Nah.
So I built this:
โI wanted a RAG system that was so simple it could run in a terminal, speak like me, and understand what part of me it's referring to.โ
๐ต๏ธโโ๏ธ When to Use ragstruct
Use ragstruct if:
- โ You have structured memory or JSON knowledge
- โ You want fast retrieval from text keys
- โ You want context-aware LLMs without training
- โ You care about token savings + control
- โ Youโre building personal AI or local agents
๐ช How it Works (Pseudo-Finetuning)
Instead of retraining the model, you remind the model what to say by:
- Embedding your JSON keys
- Matching input queries to relevant memory
- Injecting that into the LLM prompt
This creates the effect of fine-tuning, without touching weights.
๐ Comparison: Finetuning vs ragstruct
| Traditional Finetuning | ragstruct (Pseudo) |
|---|---|
| Requires large training data | Works off your real JSON |
| Needs GPUs, money, time | Just Python + CPU |
| Locked once trained | Dynamic memory updates |
| Expensive to iterate | Instant memory edits |
| One model only | Use any LLM (local/cloud) |
๐ Smart Tips
๐ Format Your JSON
Nested or list-heavy JSON? ragstruct flattens and formats it like this:
{
"name": "Felix AI",
"description": "A crypto forecasting agent.",
"tech_stack": ["Python", "XGBoost"]
}
...so your LLM sees clean chunks. Perfect for memory injection.
๐ง Compress Chat History
If injecting full chat is too heavy, summarize it:
from transformers import pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
summary = summarizer(chat_text, max_length=100)[0]['summary_text']
๐ Structuring Text into JSON (Optional)
Use the Structurer module to convert long .txt docs into structured JSON using any LLM:
from ragstruct.structurer import Structurer
struct = Structurer(llm=your_llm)
structured = struct.structure_document("raw text block")
Handles chunking, cleaning, and LLM-guided structuring.
โ ๏ธ What ragstruct Is Not
- Not a full generation pipeline โ you supply the LLM
- Not multi-user scalable out of the box (but extendable)
- Not a replacement for real finetuning โ it fakes it smartly
๐ Summary
- ๐ ragstruct injects only what matters
- โ JSON-only, no infra needed
- ๐ Works with any LLM or chat agent
- ๐ Fast, clean, dev-focused retrieval
- ๐ซ Perfect for personal AI memory
โDonโt train your model. Train your memory.โ โ Joshi Felix
Ready to build something with soul? Plug in your JSON, choose your LLM, and go.
Built with vim & vision by Joshi Felix.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ragstruct-0.1.0.tar.gz.
File metadata
- Download URL: ragstruct-0.1.0.tar.gz
- Upload date:
- Size: 7.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb0857e666eef291f76c9122cbfd649ced17307413cbb23c79c2a901bf745b4c
|
|
| MD5 |
dfcbb9db15370665c5d64ba79c2d87e0
|
|
| BLAKE2b-256 |
ef0fac8da4cac1749b957b2bcc40a941686b2f1cbce41c0f86297850c2d4feae
|
Provenance
The following attestation bundles were made for ragstruct-0.1.0.tar.gz:
Publisher:
publish.yml on Joshikarank/ragstruct
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ragstruct-0.1.0.tar.gz -
Subject digest:
bb0857e666eef291f76c9122cbfd649ced17307413cbb23c79c2a901bf745b4c - Sigstore transparency entry: 198221192
- Sigstore integration time:
-
Permalink:
Joshikarank/ragstruct@aea0a9c0dbae4f3e70b9f58aad77286d00846de6 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Joshikarank
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@aea0a9c0dbae4f3e70b9f58aad77286d00846de6 -
Trigger Event:
release
-
Statement type:
File details
Details for the file ragstruct-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ragstruct-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2314da57266e124a06a38116bc9e88aaed8a5d3c9e750f7f78074c8e05f1eab
|
|
| MD5 |
468f773972b7559c49180c8a4856189c
|
|
| BLAKE2b-256 |
990f5b98bca1669a76fc74e93d83ce4106ae5f2879599a3d2ea8e3ddb8c2cb88
|
Provenance
The following attestation bundles were made for ragstruct-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on Joshikarank/ragstruct
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ragstruct-0.1.0-py3-none-any.whl -
Subject digest:
a2314da57266e124a06a38116bc9e88aaed8a5d3c9e750f7f78074c8e05f1eab - Sigstore transparency entry: 198221194
- Sigstore integration time:
-
Permalink:
Joshikarank/ragstruct@aea0a9c0dbae4f3e70b9f58aad77286d00846de6 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Joshikarank
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@aea0a9c0dbae4f3e70b9f58aad77286d00846de6 -
Trigger Event:
release
-
Statement type: