Skip to main content

Cléa-API: A framework for document load and hybrid search engine combining vector and metadata-based search. CRUD operations are performed using FastAPI.

Project description

Cléa-API 🚀

Hybrid document-search framework for PostgreSQL + pgvector

Licence MIT Docs

Cléa-API charge des documents multi-formats, les segmente, les vectorise et fournit une recherche hybride (vectorielle + filtres SQL) prête à l’emploi. Il s’utilise :

  • via endpoints REST (FastAPI) ;
  • en librairie Python (extraction, pipeline, recherche) ;
  • avec une base PostgreSQL + pgvector auto-indexée par corpus.

Sommaire rapide

Sujet Lien
Docs HTML (MkDocs) https://WillIsback.github.io/clea-api
Structure & concepts docs/index.md
Guide d’extraction docs/doc_loader.md
Base de données & index docs/database.md
Recherche hybride docs/search.md
Pipeline end-to-end docs/pipeline.md

Important : le présent README n’est pas compilé par MkDocs ;
il contient donc seulement les informations de démarrage. La documentation complète vit dans le dossier docs/.


Caractéristiques clés

  • 🔄 Chargement multi-formats : PDF, DOCX, HTML, JSON, TXT, …

  • 🧩 Segmentation hiérarchique : Section ▶ Paragraphe ▶ Chunk.

  • 🔍 Recherche hybride : ivfflat ou HNSW + Cross-Encoder rerank.

  • Pipeline “one-liner” :

    from pipeline import process_and_store
    process_and_store("rapport.pdf", theme="R&D")
    
  • 📦 Architecture modulaire : ajoutez un extracteur ou un moteur en quelques lignes.

  • 🐳 Docker-ready & CI-friendly (tests PyTest, docs MkDocs).


Arborescence du dépôt

.
├── doc_loader/   # Extraction & chargement
├── vectordb/     # Modèles SQLAlchemy + recherche
├── pipeline/     # Orchestrateur end-to-end
├── docs/              # Documentation MkDocs
├── demo/              # Fichiers d’exemple
├── start.sh           # Script de démarrage API
├── Dockerfile         # Build image
└── ...

Installation

Prérequis

  • Python ≥ 3.11
  • PostgreSQL ≥ 14 avec l’extension pgvector
  • (Optionnel) WSL 2 + openSUSE Tumbleweed

Étapes

# 1. Cloner
git clone https://github.com/<your-gh-user>/clea-api.git
cd clea-api

# 2. Dépendances
uv pip install -r requirements.txt   # ↳ gestionnaire 'uv'

# 3. Variables d’environnement
cp .env.sample .env   # puis éditez au besoin

# 4. Initialisation DB
uv python -m clea_vectordb.init_db

# 5. Lancer l’API
./start.sh            # ➜ http://localhost:8080

Utilisation express

Chargement simple

curl -X POST http://localhost:8080/doc_loader/upload-file \
     -F "file=@demo/devis.pdf" -F "theme=Achat"

Pipeline complet (upload → segment → index)

curl -X POST http://localhost:8080/pipeline/process-and-store \
     -F "file=@demo/devis.pdf" -F "theme=Achat" -F "max_length=800"

Recherche hybride

curl -X POST http://localhost:8080/search/hybrid_search \
     -H "Content-Type: application/json" \
     -d '{"query":"analyse risques", "top_k":8}'

Tests

uv run pytest           # tous les tests unitaires

Déploiement Docker

docker build -t clea-api .
docker run -p 8080:8080 clea-api

Contribuer 🤝

  1. Fork → branche (feat/ma-feature)
  2. uv run pytest && mkdocs build doivent passer
  3. Ouvrez une Pull Request claire et concise

Licence

Distribué sous licence MIT – voir LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clea_api-0.1.2.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clea_api-0.1.2-py3-none-any.whl (4.2 kB view details)

Uploaded Python 3

File details

Details for the file clea_api-0.1.2.tar.gz.

File metadata

  • Download URL: clea_api-0.1.2.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.16

File hashes

Hashes for clea_api-0.1.2.tar.gz
Algorithm Hash digest
SHA256 dcfa69b430264f4ab400ab7677ac9e73cf0d7a959d54b02c4d830a4b083c7f35
MD5 e5305672dc52e8bf48e604e0f79e66f0
BLAKE2b-256 821ad33f4eb847e1dcccc51bf270c82c617da4a7cf91ea7ce3f1f1558bedc27f

See more details on using hashes here.

File details

Details for the file clea_api-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: clea_api-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 4.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.16

File hashes

Hashes for clea_api-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e9ae3522ca6cd67d6a421a8a1c88444b40d07d7a99c926623ffeef9b89a7c514
MD5 2e94a5478cfe51b63da915f457d4c424
BLAKE2b-256 c3ee03254cc4023ffe3224e89f893a779670e28dc5b254497e5f8e03806677f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page