Skip to main content

Knowledge Management System that connects to your RAG system

Project description

Simba - Your Knowledge Management System

Simba Logo

Connect your knowledge to any RAG system

Simba  - Connect your Knowledge into any RAG based system | Product Hunt

License Stars Forks Issues Pull Requests PyPI Downloads

Twitter Follow

๐Ÿ“– Overview

Simba is an open-source, portable Knowledge Management System (KMS) designed specifically for seamless integration with Retrieval-Augmented Generation (RAG) systems. With its intuitive UI, modular architecture, and powerful SDK, Simba simplifies knowledge management, allowing developers to focus on building advanced AI solutions.

Table of Contents

๐Ÿš€ Features

  • ๐Ÿ”Œ Powerful SDK: Comprehensive Python SDK for easy integration.
  • ๐Ÿงฉ Modular Architecture: Flexible integration of vector stores, embedding models, chunkers, and parsers.
  • ๐Ÿ–ฅ๏ธ Modern UI: User-friendly interface for managing document chunks.
  • ๐Ÿ”— Seamless Integration: Effortlessly connects with any RAG-based system.
  • ๐Ÿ‘จโ€๐Ÿ’ป Developer-Centric: Simplifies complex knowledge management tasks.
  • ๐Ÿ“ฆ Open Source & Extensible: Community-driven with extensive customization options.

๐ŸŽฅ Demo

Watch the demo

๐Ÿ› ๏ธ Getting Started

๐Ÿ“‹ Prerequisites

Ensure you have the following installed:

๐Ÿ”Œ Quickstart Simba SDK Usage

pip install simba-client

Leverage Simba's SDK for powerful programmatic access:

from simba_sdk import SimbaClient

client = SimbaClient(api_url="http://localhost:8000") # you need to install simba-core and run simba server first 

document = client.documents.create(file_path="path/to/your/document.pdf")
document_id = document[0]["id"]

parsing_result = client.parser.parse_document(document_id, parser="docling", sync=True)

retrieval_results = client.retriever.retrieve(query="your-query")

for result in retrieval_results["documents"]:
    print(f"Content: {result['page_content']}")
    print(f"Metadata: {result['metadata']['source']}")
    print("====" * 10)

Explore more in the Simba SDK documentation.

๐Ÿ“ฆ Installation

Install Simba core :

pip install simba-core

Or Clone and set up the repository:

git clone https://github.com/GitHamza0206/simba.git
cd simba
poetry config virtualenvs.in-project true
poetry install
source .venv/bin/activate

๐Ÿ”‘ Configuration

Create a .env file:

OPENAI_API_KEY=your_openai_api_key
REDIS_HOST=localhost
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/1

Configure config.yaml:

# config.yaml

project:
  name: "Simba"
  version: "1.0.0"
  api_version: "/api/v1"

paths:
  base_dir: null  # Will be set programmatically
  faiss_index_dir: "vector_stores/faiss_index"
  vector_store_dir: "vector_stores"

llm:
  provider: "openai"
  model_name: "gpt-4o-mini"
  temperature: 0.0
  max_tokens: null
  streaming: true
  additional_params: {}

embedding:
  provider: "huggingface"
  model_name: "BAAI/bge-base-en-v1.5"
  device: "mps"  # Changed from mps to cpu for container compatibility
  additional_params: {}

vector_store:
  provider: "faiss"
  collection_name: "simba_collection"

  additional_params: {}

chunking:
  chunk_size: 512
  chunk_overlap: 200

retrieval:
  method: "hybrid" # Options: default, semantic, keyword, hybrid, ensemble, reranked
  k: 5
  # Method-specific parameters
  params:
    # Semantic retrieval parameters
    score_threshold: 0.5
    
    # Hybrid retrieval parameters
    prioritize_semantic: true
    
    # Ensemble retrieval parameters
    weights: [0.7, 0.3]  # Weights for semantic and keyword retrievers
    
    # Reranking parameters
    reranker_model: colbert
    reranker_threshold: 0.7

# Database configuration
database:
  provider: litedb # Options: litedb, sqlite
  additional_params: {}

celery: 
  broker_url: ${CELERY_BROKER_URL:-redis://redis:6379/0}
  result_backend: ${CELERY_RESULT_BACKEND:-redis://redis:6379/1}

๐Ÿš€ Running Simba

Start the server, frontend, and parsers:

simba server
simba front
simba parsers

๐Ÿณ Docker Deployment

Deploy Simba using Docker:

  • CPU:
DEVICE=cpu make build
DEVICE=cpu make up
  • NVIDIA GPU:
DEVICE=cuda make build
DEVICE=cuda make up
  • Apple Silicon:
DEVICE=cpu make build
DEVICE=cpu make up

๐Ÿ Roadmap

  • ๐Ÿ’ป pip install simba-core
  • ๐Ÿ”ง pip install simba-sdk
  • ๐ŸŒ www.simba-docs.com
  • ๐Ÿ”’ Auth & access management
  • ๐Ÿ•ธ๏ธ Web scraping
  • โ˜๏ธ Cloud integrations (Azure/AWS/GCP)
  • ๐Ÿ“š Additional parsers and chunkers
  • ๐ŸŽจ Enhanced UX/UI

๐Ÿค Contributing

We welcome contributions! Follow these steps:

  • Fork the repository
  • Create a feature or bugfix branch
  • Commit clearly documented changes
  • Submit a pull request

๐Ÿ’ฌ Support & Contact

For support or inquiries, open an issue on GitHub or contact Hamza Zerouali.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simba_core-0.3.0.tar.gz (336.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simba_core-0.3.0-py3-none-any.whl (373.4 kB view details)

Uploaded Python 3

File details

Details for the file simba_core-0.3.0.tar.gz.

File metadata

  • Download URL: simba_core-0.3.0.tar.gz
  • Upload date:
  • Size: 336.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for simba_core-0.3.0.tar.gz
Algorithm Hash digest
SHA256 b18bef12e581dc32897855d824cac9cefd634b69b98616c879f344ed86eee937
MD5 c913dc6d3d340f07f32522509f953dc7
BLAKE2b-256 239c622f5099e10fd75b86cacc79db4664555f2a397d0ab51801ff142701717e

See more details on using hashes here.

File details

Details for the file simba_core-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: simba_core-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 373.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for simba_core-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6eb87876111556af1d3c98f1ecd77c081c0c147efd5c73a1430f32073c5fc1b6
MD5 4aae3562f7f0b5fe8af65366cce6850f
BLAKE2b-256 82491ac6efae593e1a3ef3d7b5354a9dc207737d942c0f221c4a79acc94c8879

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page