Skip to main content

ragstruct - A Pseudo-Finetuning RAG Framework for structured JSON-based retrieval

Project description

๐ŸŽต ragstruct โ€” A Pseudo-Finetuning RAG Framework

๐Ÿ” Lightweight semantic retriever using structured JSON ๐Ÿ’ก Built with love by Joshikaran K. (Joshi Felix)


๐Ÿง  What is ragstruct?

ragstruct is a minimal, blazing-fast semantic retrieval library built for anyone who wants to simulate fine-tuned behavior without ever training a model. You give it structured JSON memory, and it gives your LLM meaningful context, fast.

No vector DBs. No finetuning. No heavy dependencies.


โœจ Key Features

  • โœ… Zero-database JSON-based retriever
  • ๐ŸŽฏ Built on BGE embeddings (BAAI/bge-large-en-v1.5)
  • โณ Fast top-k semantic matches
  • ๐Ÿ”„ Memory tracking for context injection
  • ๐Ÿ’ช Works with any LLM (OpenAI, Mistral, local)
  • ๐Ÿ™Œ Great for agents, personal AIs, digital twins

๐Ÿš€ Installation

pip install ragstruct .

๐Ÿ“Š Use Case Examples

  • ๐Ÿ‘ค Digital Twin memory retrieval (e.g., Joshi AI)
  • ๐Ÿง‘โ€๐Ÿ’ผ Resume bots and personal agent context
  • ๐Ÿง  Mental health / therapy state tracking
  • ๐ŸŽ“ LLM Study-buddy with syllabus JSON
  • ๐Ÿ“š Retrieval-based storytelling agents
  • ๐ŸŽฎ Game character memory/NPCs

๐ŸŽก Why ragstruct Exists

I (Joshikaran) built ragstruct while creating Joshi AI, a digital twin that could talk like me, remember my projects, reflect my mindset.

Every existing RAG pipeline felt like overkill. LangChain + vector DB + server just to search my own memory? Nah.

So I built this:

โ€œI wanted a RAG system that was so simple it could run in a terminal, speak like me, and understand what part of me it's referring to.โ€


๐Ÿ•ต๏ธโ€โ™‚๏ธ When to Use ragstruct

Use ragstruct if:

  • โœ… You have structured memory or JSON knowledge
  • โœ… You want fast retrieval from text keys
  • โœ… You want context-aware LLMs without training
  • โœ… You care about token savings + control
  • โœ… Youโ€™re building personal AI or local agents

๐Ÿช– How it Works (Pseudo-Finetuning)

Instead of retraining the model, you remind the model what to say by:

  1. Embedding your JSON keys
  2. Matching input queries to relevant memory
  3. Injecting that into the LLM prompt

This creates the effect of fine-tuning, without touching weights.


๐Ÿ“Š Comparison: Finetuning vs ragstruct

Traditional Finetuning ragstruct (Pseudo)
Requires large training data Works off your real JSON
Needs GPUs, money, time Just Python + CPU
Locked once trained Dynamic memory updates
Expensive to iterate Instant memory edits
One model only Use any LLM (local/cloud)

๐Ÿ”„ Smart Tips

๐Ÿ”„ Format Your JSON

Nested or list-heavy JSON? ragstruct flattens and formats it like this:

{
  "name": "Felix AI",
  "description": "A crypto forecasting agent.",
  "tech_stack": ["Python", "XGBoost"]
}

...so your LLM sees clean chunks. Perfect for memory injection.

๐Ÿง Compress Chat History

If injecting full chat is too heavy, summarize it:

from transformers import pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
summary = summarizer(chat_text, max_length=100)[0]['summary_text']

๐ŸŒ Structuring Text into JSON (Optional)

Use the Structurer module to convert long .txt docs into structured JSON using any LLM:

from ragstruct.structurer import Structurer
struct = Structurer(llm=your_llm)
structured = struct.structure_document("raw text block")

Handles chunking, cleaning, and LLM-guided structuring.


โš ๏ธ What ragstruct Is Not

  • Not a full generation pipeline โ€” you supply the LLM
  • Not multi-user scalable out of the box (but extendable)
  • Not a replacement for real finetuning โ€” it fakes it smartly

๐Ÿ”– Summary

  • ๐Ÿ”„ ragstruct injects only what matters
  • โœ… JSON-only, no infra needed
  • ๐ŸŒ Works with any LLM or chat agent
  • ๐Ÿš€ Fast, clean, dev-focused retrieval
  • ๐Ÿซ  Perfect for personal AI memory

โ€œDonโ€™t train your model. Train your memory.โ€ โ€” Joshi Felix


Ready to build something with soul? Plug in your JSON, choose your LLM, and go.

Built with vim & vision by Joshi Felix.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragstruct-0.1.0.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ragstruct-0.1.0-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file ragstruct-0.1.0.tar.gz.

File metadata

  • Download URL: ragstruct-0.1.0.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ragstruct-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bb0857e666eef291f76c9122cbfd649ced17307413cbb23c79c2a901bf745b4c
MD5 dfcbb9db15370665c5d64ba79c2d87e0
BLAKE2b-256 ef0fac8da4cac1749b957b2bcc40a941686b2f1cbce41c0f86297850c2d4feae

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragstruct-0.1.0.tar.gz:

Publisher: publish.yml on Joshikarank/ragstruct

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ragstruct-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ragstruct-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ragstruct-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a2314da57266e124a06a38116bc9e88aaed8a5d3c9e750f7f78074c8e05f1eab
MD5 468f773972b7559c49180c8a4856189c
BLAKE2b-256 990f5b98bca1669a76fc74e93d83ce4106ae5f2879599a3d2ea8e3ddb8c2cb88

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragstruct-0.1.0-py3-none-any.whl:

Publisher: publish.yml on Joshikarank/ragstruct

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page