A sample AI toolkit by Schogini Systems with Retrieval-Augmented Generation (RAG).
Project description
SchoginiAI
SchoginiAI is an AI toolkit developed by Schogini Systems that provides Retrieval-Augmented Generation (RAG) capabilities using LangChain and OpenAI. It leverages both FAISS and ChromaDB for efficient vector storage and retrieval, enabling advanced AI-driven solutions for small businesses and beyond.
๐ Features
- Recursive Text Chunking: Efficiently splits large text corpora into manageable chunks.
- OpenAI Embeddings: Utilizes OpenAI's embedding models for high-quality vector representations.
- FAISS & ChromaDB Vector Stores: Offers flexibility to choose between FAISS and ChromaDB for vector storage and retrieval via configuration.
- Retrieval-Augmented Generation (RAG): Combines retrieval mechanisms with language models to generate informed responses.
- Dockerized Environment: Easily build and run in isolated Docker containers for consistency across environments.
- Environment Variable Management: Securely handles API keys and sensitive information using
.envfiles.
๐ Installation
๐ฆ From PyPI
Install the latest version of SchoginiAI directly from PyPI:
pip install SchoginiAI
๐งโ๐ป From Source
Clone the repository and install the package manually:
git clone https://github.com/yourusername/SchoginiAI.git
cd SchoginiAI
pip install .
Replace yourusername with your actual GitHub username.
๐ง Usage
๐ Environment Setup
Create a .env file in the examples02/ directory to store your OpenAI API key and vector store type securely:
OPENAI_API_KEY=your_openai_api_key_here
VECTOR_STORE_TYPE=faiss # Options: 'faiss' or 'chroma'
โ ๏ธ Important: Do not commit the
.envfile to version control. Ensure it's listed in your.gitignore.
๐ Knowledge Creation
Build and save the vector store from your text corpus using the knowledge_creation.py script.
๐ Python Script
cd examples02
python knowledge_creation.py
๐ฆ Using Docker
Build the Docker image and run the container with the create argument to generate the vector store:
-
With FAISS:
docker build --no-cache -t schogini-examples . docker run --rm -e OPENAI_API_KEY="your_openai_api_key_here" schogini-examples create
-
With ChromaDB:
docker run --rm -e OPENAI_API_KEY="your_openai_api_key_here" schogini-examples create
Expected Output:
-
For FAISS:
Running knowledge_creation.py with VECTOR_STORE_TYPE='faiss'... FAISS vector store created. FAISS vector store saved to faiss_store -
For ChromaDB:
Running knowledge_creation.py with VECTOR_STORE_TYPE='chroma'... ChromaDB vector store created at chroma_store. ChromaDB vector store persisted at chroma_store
โ Querying the Knowledge Base
Load the pre-built vector store and perform a query using the usage_example02.py script.
๐ Python Script
cd examples02
python usage_example02.py
๐ฆ Using Docker
Run the container with the query argument to perform the query:
-
With FAISS:
docker run --rm -e OPENAI_API_KEY="your_openai_api_key_here" schogini-examples query
-
With ChromaDB:
docker run --rm -e OPENAI_API_KEY="your_openai_api_key_here" schogini-examples query
Expected Output:
Running usage_example02.py with VECTOR_STORE_TYPE='faiss'...
Answer: Schogini Systems is a pioneer in AI Chatbots.
We specialize in automation solutions for small businesses.
or for ChromaDB:
Running usage_example02.py with VECTOR_STORE_TYPE='chroma'...
Answer: Schogini Systems is a pioneer in AI Chatbots.
We specialize in automation solutions for small businesses.
๐ณ Docker Usage
๐ Build the Docker Image
Navigate to the project root directory (where the Dockerfile is located) and build the Docker image:
docker build --no-cache -t schogini-examples .
๐ Run the Docker Container
1. Create Vector Store
-
With FAISS:
docker run --rm -e OPENAI_API_KEY="your_openai_api_key_here" schogini-examples create
-
With ChromaDB:
docker run --rm -e OPENAI_API_KEY="your_openai_api_key_here" schogini-examples create
2. Query Vector Store
-
With FAISS:
docker run --rm -e OPENAI_API_KEY="your_openai_api_key_here" schogini-examples query
-
With ChromaDB:
docker run --rm -e OPENAI_API_KEY="your_openai_api_key_here" schogini-examples query
Note: Replace
"your_openai_api_key_here"with your actual OpenAI API key.
๐ฆ Dependencies
SchoginiAI relies on the following Python packages:
langchain>=0.0.200,<0.1.0langchain-community>=0.0.20,<0.1.0langchain-chroma>=0.1.0,<1.0.0openai>=0.28.1,<0.29.0tiktoken>=0.4.0,<0.5.0faiss-cpu>=1.7.6,<1.8.0python-dotenv>=0.21.0,<0.22.0chromadb>=0.3.22,<0.4.0
These dependencies are automatically installed when you install SchoginiAI via pip or using requirements.txt in Docker.
๐ requirements.txt
langchain>=0.0.200,<0.1.0
langchain-community>=0.0.20,<0.1.0
langchain-chroma>=0.1.0,<1.0.0
openai>=0.28.1,<0.29.0
tiktoken>=0.4.0,<0.5.0
faiss-cpu>=1.7.6,<1.8.0
python-dotenv>=0.21.0,<0.22.0
chromadb>=0.3.22,<0.4.0
๐ณ Docker Configuration
๐ Dockerfile
# Use a lightweight Python base image
FROM python:3.11-slim
# Install bash and other dependencies (if needed)
RUN apt-get update && apt-get install -y bash && rm -rf /var/lib/apt/lists/*
# Set environment variables for Python
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
# Create and set the working directory
WORKDIR /app
# Copy the current directory contents into the container at /app
COPY . /app
# Install Python dependencies into the current directory
RUN pip install --upgrade pip
RUN pip install -r requirements.txt --target=.
RUN pip install -e .
# Ensure the .env file is present
COPY examples02/.env /app/examples02/.env
# Make sure the scripts are executable
RUN chmod +x examples02/doit.sh
# Use bash as the entrypoint
ENTRYPOINT ["/bin/bash"]
# Default command: run doit.sh with arguments
CMD ["examples02/doit.sh"]
๐ doit.sh
Handles the execution of either the knowledge creation or querying scripts based on input arguments.
#!/bin/bash
set -e # Exit immediately if a command exits with a non-zero status
# Check if one argument is provided
if [ "$#" -ne 1 ]; then
echo "Usage: $0 {create|query}"
exit 1
fi
SCRIPT=$1
if [ "$SCRIPT" == "create" ]; then
echo "Running knowledge_creation.py..."
python examples02/knowledge_creation.py
elif [ "$SCRIPT" == "query" ]; then
echo "Running usage_example02.py..."
python examples02/usage_example02.py
else
echo "Invalid argument. Use 'create' or 'query'."
exit 1
fi
Usage Examples:
Create Vector Store with FAISS:
docker run --rm -e OPENAI_API_KEY="your_openai_api_key_here" schogini-examples createCreate Vector Store with ChromaDB:
docker run --rm -e OPENAI_API_KEY="your_openai_api_key_here" schogini-examples createQuery Vector Store with FAISS:
docker run --rm -e OPENAI_API_KEY="your_openai_api_key_here" schogini-examples queryQuery Vector Store with ChromaDB:
docker run --rm -e OPENAI_API_KEY="your_openai_api_key_here" schogini-examples query
๐ Project Structure
SchoginiAI/
โโโ SchoginiAI/
โ โโโ __init__.py
โ โโโ main.py
โโโ examples02/
โ โโโ usage_example02.py
โ โโโ knowledge_creation.py
โ โโโ .env
โโโ tests/
โ โโโ test_main.py
โโโ .gitignore
โโโ LICENSE
โโโ README.md
โโโ requirements.txt
โโโ setup.py
โโโ Dockerfile
โโโ doit.sh
โโโ build.sh
๐ก License
This project is licensed under the MIT License. See the LICENSE file for details.
๐ Contributing
Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.
- Fork the repository.
- Create your feature branch:
git checkout -b feature/YourFeature - Commit your changes:
git commit -m 'Add some feature' - Push to the branch:
git push origin feature/YourFeature - Open a pull request.
๐ .gitignore
Ensure you have a .gitignore file to exclude unnecessary or sensitive files from your GitHub repository.
# Python
__pycache__/
*.py[cod]
# Distribution / packaging
build/
dist/
*.egg-info/
# Environment
venv/
.env/
# Vector Stores
faiss_store/
chroma_store/
# OS generated files
.DS_Store
# IDE configs
.vscode/
.idea/
# Secrets
.pypirc
.env
๐ Additional Resources
- LangChain Documentation
- OpenAI API Documentation
- FAISS Documentation
- ChromaDB Documentation
- Python-Dotenv Documentation
๐ฏ Summary
By configuring your SchoginiAI project to select the vector store type (faiss or chroma) through the .env file, you achieve a more streamlined and maintainable setup. This approach centralizes configuration, reduces the need for repetitive command-line arguments, and aligns with best practices for environment-specific settings.
Key Actions Taken:
-
Environment Variable Configuration:
- Added
VECTOR_STORE_TYPEto the.envfile to specify the desired vector store.
- Added
-
Script Modifications:
- Removed command-line argument parsing for vector store selection.
- Updated scripts to read
VECTOR_STORE_TYPEfrom environment variables.
-
Docker Adjustments:
- Ensured the
.envfile is copied into the Docker image. - Modified
doit.shto eliminate the need for vector store type arguments.
- Ensured the
-
Verification:
- Provided steps to verify that the changes work both locally and within Docker.
-
Documentation:
- Updated
README.mdto reflect the new configuration method.
- Updated
-
Testing:
- Recommended implementing unit tests to ensure correct vector store selection based on
.envsettings.
- Recommended implementing unit tests to ensure correct vector store selection based on
Next Steps:
- Implement Unit Tests: Ensure that vector store selections work as intended.
- Continuous Integration (CI): Set up CI pipelines to automatically test configurations.
- Monitor Dependencies: Keep your packages updated to avoid future deprecations or compatibility issues.
- User Documentation: Make sure all users are aware of the
.envconfiguration for vector store selection.
Feel free to reach out if you need further assistance or encounter any issues during implementation!
By following this guide, your SchoginiAI module is now equipped to handle both FAISS and ChromaDB vector stores seamlessly, ensuring compatibility with the latest LangChain updates and maintaining best practices for security and maintainability.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file schoginiai-0.1.8.tar.gz.
File metadata
- Download URL: schoginiai-0.1.8.tar.gz
- Upload date:
- Size: 13.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28a7f651d82182719adf6fc30705e1923fc89bdd97b8de24e41374d59de86525
|
|
| MD5 |
de9c0cd1e64edb73cb5f753832c034eb
|
|
| BLAKE2b-256 |
3c28067cd0047e7c92e8a429028514ef23be048db60aa39b76a2ae4a7fe5c010
|
File details
Details for the file SchoginiAI-0.1.8-py3-none-any.whl.
File metadata
- Download URL: SchoginiAI-0.1.8-py3-none-any.whl
- Upload date:
- Size: 9.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5a7b656316af14eb4928bea305bebfecb900f330ae7057b8928860a0b1b9ee7b
|
|
| MD5 |
774b671df1f4f936a088ca99d86f26d2
|
|
| BLAKE2b-256 |
a8b83721c0670a3ac0917e48494366e6af585d5ce7327e74cac026a205231e61
|