Skip to main content

Knowledge Management System that connects to your RAG system

Project description

Simba - Your Knowledge Management System

Simba Logo

Connect your knowledge to any RAG system

Simba  - Connect your Knowledge into any RAG based system | Product Hunt

License Stars Forks Issues Pull Requests PyPI Downloads

Twitter Follow

๐Ÿ“– Overview

Simba is an open-source, portable Knowledge Management System (KMS) designed specifically for seamless integration with Retrieval-Augmented Generation (RAG) systems. With its intuitive UI, modular architecture, and powerful SDK, Simba simplifies knowledge management, allowing developers to focus on building advanced AI solutions.

Table of Contents

๐Ÿš€ Features

  • ๐Ÿ”Œ Powerful SDK: Comprehensive Python SDK for easy integration.
  • ๐Ÿงฉ Modular Architecture: Flexible integration of vector stores, embedding models, chunkers, and parsers.
  • ๐Ÿ–ฅ๏ธ Modern UI: User-friendly interface for managing document chunks.
  • ๐Ÿ”— Seamless Integration: Effortlessly connects with any RAG-based system.
  • ๐Ÿ‘จโ€๐Ÿ’ป Developer-Centric: Simplifies complex knowledge management tasks.
  • ๐Ÿ“ฆ Open Source & Extensible: Community-driven with extensive customization options.

๐ŸŽฅ Demo

Watch the demo

๐Ÿ› ๏ธ Getting Started

๐Ÿ“‹ Prerequisites

Ensure you have the following installed:

๐Ÿ”Œ Quickstart Simba SDK Usage

pip install simba-client

Leverage Simba's SDK for powerful programmatic access:

from simba_sdk import SimbaClient

client = SimbaClient(api_url="http://localhost:8000") # you need to install simba-core and run simba server first 

document = client.documents.create(file_path="path/to/your/document.pdf")
document_id = document[0]["id"]

parsing_result = client.parser.parse_document(document_id, parser="docling", sync=True)

retrieval_results = client.retriever.retrieve(query="your-query")

for result in retrieval_results["documents"]:
    print(f"Content: {result['page_content']}")
    print(f"Metadata: {result['metadata']['source']}")
    print("====" * 10)

Explore more in the Simba SDK documentation.

๐Ÿ“ฆ Installation

Install Simba core :

pip install simba-core

Or Clone and set up the repository:

git clone https://github.com/GitHamza0206/simba.git
cd simba
poetry config virtualenvs.in-project true
poetry install
source .venv/bin/activate

๐Ÿ”‘ Configuration

Create a .env file:

OPENAI_API_KEY=your_openai_api_key
REDIS_HOST=localhost
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/1

Configure config.yaml:

# config.yaml

project:
  name: "Simba"
  version: "1.0.0"
  api_version: "/api/v1"

paths:
  base_dir: null  # Will be set programmatically
  faiss_index_dir: "vector_stores/faiss_index"
  vector_store_dir: "vector_stores"

llm:
  provider: "openai"
  model_name: "gpt-4o-mini"
  temperature: 0.0
  max_tokens: null
  streaming: true
  additional_params: {}

embedding:
  provider: "huggingface"
  model_name: "BAAI/bge-base-en-v1.5"
  device: "mps"  # Changed from mps to cpu for container compatibility
  additional_params: {}

vector_store:
  provider: "faiss"
  collection_name: "simba_collection"

  additional_params: {}

chunking:
  chunk_size: 512
  chunk_overlap: 200

retrieval:
  method: "hybrid" # Options: default, semantic, keyword, hybrid, ensemble, reranked
  k: 5
  # Method-specific parameters
  params:
    # Semantic retrieval parameters
    score_threshold: 0.5
    
    # Hybrid retrieval parameters
    prioritize_semantic: true
    
    # Ensemble retrieval parameters
    weights: [0.7, 0.3]  # Weights for semantic and keyword retrievers
    
    # Reranking parameters
    reranker_model: colbert
    reranker_threshold: 0.7

# Database configuration
database:
  provider: litedb # Options: litedb, sqlite
  additional_params: {}

celery: 
  broker_url: ${CELERY_BROKER_URL:-redis://redis:6379/0}
  result_backend: ${CELERY_RESULT_BACKEND:-redis://redis:6379/1}

๐Ÿš€ Running Simba

Start the server, frontend, and parsers:

simba server
simba front
simba parsers

๐Ÿณ Docker Deployment

Deploy Simba using Docker:

  • CPU:
DEVICE=cpu make build
DEVICE=cpu make up
  • NVIDIA GPU:
DEVICE=cuda make build
DEVICE=cuda make up
  • Apple Silicon:
DEVICE=cpu make build
DEVICE=cpu make up

๐Ÿ Roadmap

  • ๐Ÿ’ป pip install simba-core
  • ๐Ÿ”ง pip install simba-sdk
  • ๐ŸŒ www.simba-docs.com
  • ๐Ÿ”’ Auth & access management
  • ๐Ÿ•ธ๏ธ Web scraping
  • โ˜๏ธ Cloud integrations (Azure/AWS/GCP)
  • ๐Ÿ“š Additional parsers and chunkers
  • ๐ŸŽจ Enhanced UX/UI

๐Ÿค Contributing

We welcome contributions! Follow these steps:

  • Fork the repository
  • Create a feature or bugfix branch
  • Commit clearly documented changes
  • Submit a pull request

๐Ÿ’ฌ Support & Contact

For support or inquiries, open an issue on GitHub or contact Hamza Zerouali.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simba_core-0.4.0.tar.gz (341.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simba_core-0.4.0-py3-none-any.whl (379.7 kB view details)

Uploaded Python 3

File details

Details for the file simba_core-0.4.0.tar.gz.

File metadata

  • Download URL: simba_core-0.4.0.tar.gz
  • Upload date:
  • Size: 341.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for simba_core-0.4.0.tar.gz
Algorithm Hash digest
SHA256 635e4103c827c9443f837ecfda77d2bedc9e822815a7bc074d1efff880d5a141
MD5 a4cea0d646a2fe619cd9f183d6ca1dd3
BLAKE2b-256 fc3827ccf3d910e51dfa4834d17d24a3d391d7b5df3f2c35bba624979d866832

See more details on using hashes here.

File details

Details for the file simba_core-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: simba_core-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 379.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for simba_core-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fb43e90568b7acea766542871f1014b3cb583f2a68b11dc3e4dfdafe460e2287
MD5 0bd177796359a87d43929976e6886b1a
BLAKE2b-256 0daeda8329156760b915d559c260d90a3a3fc562043c13ae7ebd910204ab0902

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page