Skip to main content

Indox Retrieval Augmentation

Project description

inDox

Typing SVG

inDox Lite Logo


License PyPI Python Downloads

Discord GitHub stars

Official WebsiteDocumentationDiscord

NEW: Subscribe to our mailing list for updates and news!

Indox Retrieval Augmentation is an innovative application designed to streamline information extraction from a wide range of document types, including text files, PDF, HTML, Markdown, and LaTeX. Whether structured or unstructured, Indox provides users with a powerful toolset to efficiently extract relevant data.

Indox Retrieval Augmentation is an innovative application designed to streamline information extraction from a wide range of document types, including text files, PDF, HTML, Markdown, and LaTeX. Whether structured or unstructured, Indox provides users with a powerful toolset to efficiently extract relevant data. One of its key features is the ability to intelligently cluster primary chunks to form more robust groupings, enhancing the quality and relevance of the extracted information. With a focus on adaptability and user-centric design, Indox aims to deliver future-ready functionality with more features planned for upcoming releases. Join us in exploring how Indox can revolutionize your document processing workflow, bringing clarity and organization to your data retrieval needs.

Roadmap

🤖 Model Support Implemented Description
Ollama (e.g. Llama3) Local Embedding and LLM Models powered by Ollama
HuggingFace Local Embedding and LLM Models powered by HuggingFace
Mistral Embedding and LLM Models by Cohere
Google (e.g. Gemini) Embedding and Generation Models by Google
OpenAI (e.g. GPT4) Embedding and Generation Models by OpenAI
Supported Model Via Indox Api Implemented Description
OpenAi Embedding and LLm OpenAi Model From Indox Api
Mistral Embedding and LLm Mistral Model From Indox Api
Anthropic Embedding and LLm Anthropic Model From Indox Api
📁 Loader and Splitter Implemented Description
Simple PDF Import PDF
UnstructuredIO Import Data through Unstructured
Clustered Load And Split Load pdf and texts. add a extra clustering layer
✨ RAG Features Implemented Description
Hybrid Search Semantic Search combined with Keyword Search
Semantic Caching Results saved and retrieved based on semantic meaning
Clustered Prompt Retrieve smaller chunks and do clustering and summarization
Agentic Rag Generate more reliabale answer, rank context and web search if needed
Advanced Querying Task Delegation Based on LLM Evaluation
Reranking Rerank results based on context for improved results
Customizable Metadata Free control over Metadata
🆒 Cool Bonus Implemented Description
Docker Support Indox is deployable via Docker
Customizable Frontend Indox's frontend is fully-customizable via the frontend

Examples

☑️ Examples Run in Colab
Indox Api (OpenAi) Open In Colab
Mistral (Using Unstructured) Open In Colab
OpenAi (Using Clustered Split) Open In Colab
HuggingFace Models(Mistral) Open In Colab
Ollama Open In Colab
Evaluate with IndoxJudge Open In Colab

Indox Workflow

inDox work flow

Getting Started

The following command will install the latest stable inDox

pip install Indox

To install the latest development version, you may run

pip install git+https://github.com/osllmai/inDox@master

Clone the repository and navigate to the directory:

git clone https://github.com/osllmai/inDox.git
cd inDox

Install the required Python packages:

pip install -r requirements.txt

Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named indox:

Windows

  1. Create the virtual environment:
  python -m venv indox
  1. Activate the virtual environment:
  indox\Scripts\activate

macOS/Linux

  1. Create the virtual environment:
    python3 -m venv indox
    
  2. Activate the virtual environment:
  source indox/bin/activate

Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

  pip install -r requirements.txt

Preparing Your Data

  1. Define the File Path: Specify the path to your text or PDF file.
  2. Load LLM And Embedding Models: Initialize your embedding model from Indox's selection of pre-trained models.

Quick Start

Install the Required Packages

pip install indox
pip install openai
pip install chromadb

Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named indox:

Windows

  1. Create the virtual environment:
python -m venv indox
  1. Activate the virtual environment:
indox_judge\Scripts\activate

macOS/Linux

  1. Create the virtual environment:
    python3 -m venv indox
    

2. **Activate the virtual environment:**
    ```bash
   source indox/bin/activate

Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

pip install -r requirements.txt

Load Environment Variables

To start, you need to load your API keys from the environment.

import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.environ['OPENAI_API_KEY']

Import Indox Package

Import the necessary classes from the Indox package.

from indox import IndoxRetrievalAugmentation

Importing LLM and Embedding Models

from indox.llms import OpenAi
from indox.embeddings import OpenAiEmbedding

Initialize Indox

Create an instance of IndoxRetrievalAugmentation.

Indox = IndoxRetrievalAugmentation()
openai_qa = OpenAiQA(api_key=OPENAI_API_KEY,model="gpt-3.5-turbo-0125")
openai_embeddings = OpenAiEmbedding(model="text-embedding-3-small",openai_api_key=OPENAI_API_KEY)
file_path = "sample.txt"

In this section, we take advantage of the unstructured library to load documents and split them into chunks by title. This method helps in organizing the document into manageable sections for further processing.

from indox.data_loader_splitter import UnstructuredLoadAndSplit
loader_splitter = UnstructuredLoadAndSplit(file_path=file_path)
docs = loader_splitter.load_and_chunk()
Starting processing...
End Chunking process.

Storing document chunks in a vector store is crucial for enabling efficient retrieval and search operations. By converting text data into vector representations and storing them in a vector store, you can perform rapid similarity searches and other vector-based operations.

from indox.vector_stores import ChromaVectorStore
db = ChromaVectorStore(collection_name="sample",embedding=embed_openai)
Indox.connect_to_vectorstore(db)
Indox.store_in_vectorstore(docs)
2024-05-14 15:33:04,916 - INFO - Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.
2024-05-14 15:33:12,587 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-05-14 15:33:13,574 - INFO - Document added successfully to the vector store.

Connection established successfully.

<Indox.vectorstore.ChromaVectorStore at 0x28cf9369af0>

Quering

query = "how cinderella reach her happy ending?"
retriever = indox.QuestionAnswer(vector_database=db,llm=openai_qa,top_k=5)
retriever.invoke(query)
2024-05-14 15:34:55,380 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-05-14 15:35:01,917 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
'Cinderella reached her happy ending by enduring mistreatment from her step-family, finding solace and help from the hazel tree and the little white bird, attending the royal festival where the prince recognized her as the true bride, and ultimately fitting into the golden shoe that proved her identity. This led to her marrying the prince and living happily ever after.'
retriever.context
["from the hazel-bush. Cinderella thanked him, went to her mother's\n\ngrave and planted the branch on it, and wept so much that the tears\n\nfell down on it and watered it. And it grew and became a handsome\n\ntree. Thrice a day cinderella went and sat beneath it, and wept and\n\nprayed, and a little white bird always came on the tree, and if\n\ncinderella expressed a wish, the bird threw down to her what she\n\nhad wished for.\n\nIt happened, however, that the king gave orders for a festival",
 'worked till she was weary she had no bed to go to, but had to sleep\n\nby the hearth in the cinders. And as on that account she always\n\nlooked dusty and dirty, they called her cinderella.\n\nIt happened that the father was once going to the fair, and he\n\nasked his two step-daughters what he should bring back for them.\n\nBeautiful dresses, said one, pearls and jewels, said the second.\n\nAnd you, cinderella, said he, what will you have. Father',
 'face he recognized the beautiful maiden who had danced with\n\nhim and cried, that is the true bride. The step-mother and\n\nthe two sisters were horrified and became pale with rage, he,\n\nhowever, took cinderella on his horse and rode away with her. As\n\nthey passed by the hazel-tree, the two white doves cried -\n\nturn and peep, turn and peep,\n\nno blood is in the shoe,\n\nthe shoe is not too small for her,\n\nthe true bride rides with you,\n\nand when they had cried that, the two came flying down and',
 "to send her up to him, but the mother answered, oh, no, she is\n\nmuch too dirty, she cannot show herself. But he absolutely\n\ninsisted on it, and cinderella had to be called. She first\n\nwashed her hands and face clean, and then went and bowed down\n\nbefore the king's son, who gave her the golden shoe. Then she\n\nseated herself on a stool, drew her foot out of the heavy\n\nwooden shoe, and put it into the slipper, which fitted like a\n\nglove. And when she rose up and the king's son looked at her",
 'slippers embroidered with silk and silver. She put on the dress\n\nwith all speed, and went to the wedding. Her step-sisters and the\n\nstep-mother however did not know her, and thought she must be a\n\nforeign princess, for she looked so beautiful in the golden dress.\n\nThey never once thought of cinderella, and believed that she was\n\nsitting at home in the dirt, picking lentils out of the ashes. The\n\nprince approached her, took her by the hand and danced with her.']
  .----------------.  .-----------------. .----------------.  .----------------.  .----------------. 
| .--------------. || .--------------. || .--------------. || .--------------. || .--------------. |
| |     _____    | || | ____  _____  | || |  ________    | || |     ____     | || |  ____  ____  | |
| |    |_   _|   | || ||_   \|_   _| | || | |_   ___ `.  | || |   .'    `.   | || | |_  _||_  _| | |
| |      | |     | || |  |   \ | |   | || |   | |   `. \ | || |  /  .--.  \  | || |   \ \  / /   | |
| |      | |     | || |  | |\ \| |   | || |   | |    | | | || |  | |    | |  | || |    > `' <    | |
| |     _| |_    | || | _| |_\   |_  | || |  _| |___.' / | || |  \  `--'  /  | || |  _/ /'`\ \_  | |
| |    |_____|   | || ||_____|\____| | || | |________.'  | || |   `.____.'   | || | |____||____| | |
| |              | || |              | || |              | || |              | || |              | |
| '--------------' || '--------------' || '--------------' || '--------------' || '--------------' |
  '----------------'  '----------------'  '----------------'  '----------------'  '----------------' 

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

indox-0.1.31.tar.gz (138.8 kB view details)

Uploaded Source

Built Distribution

Indox-0.1.31-py3-none-any.whl (198.5 kB view details)

Uploaded Python 3

File details

Details for the file indox-0.1.31.tar.gz.

File metadata

  • Download URL: indox-0.1.31.tar.gz
  • Upload date:
  • Size: 138.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.0

File hashes

Hashes for indox-0.1.31.tar.gz
Algorithm Hash digest
SHA256 2b2d15a5840a881b38186eb6cee80f500926a3945f6d7b0e999a76309404306f
MD5 d7276e9d2d160690eaad981f82379dfb
BLAKE2b-256 c38655718ec130563394a21d29b8a6ac0727d9e88f4b0db5d39a2f961e00ebae

See more details on using hashes here.

File details

Details for the file Indox-0.1.31-py3-none-any.whl.

File metadata

  • Download URL: Indox-0.1.31-py3-none-any.whl
  • Upload date:
  • Size: 198.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.0

File hashes

Hashes for Indox-0.1.31-py3-none-any.whl
Algorithm Hash digest
SHA256 7b5c2f75af7344e0a1be69a3cb328286b82133c4192dde4cabc82b651639b1e3
MD5 069042304748de356162bf2667faa721
BLAKE2b-256 982296b8ee95985ff0b9d2cfeb09176305341674d8a9b28496f9d4807a695a89

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page