Skip to main content

SDK Python officiel pour l'API OCR Facture France - Extraction automatique de données de factures via OCR

Project description

OCR Facture API - SDK Python

SDK Python officiel pour l'API OCR Facture France. Facilite l'intégration de l'extraction automatique de données de factures dans vos applications Python.

🚀 Installation

pip install ocr-facture-api

📖 Utilisation

Configuration de base

from ocr_facture_api import OCRFactureAPI

# Initialiser le client
api = OCRFactureAPI(
    api_key="votre_cle_api_rapidapi",
    base_url="https://ocr-facture-api-production.up.railway.app"  # Optionnel
)

Extraire les données d'une facture

Depuis un fichier

# Depuis un fichier local
result = api.extract_from_file("facture.pdf", language="fra")

# Accéder aux données extraites
invoice_data = result["extracted_data"]
print(f"Numéro de facture: {invoice_data['invoice_number']}")
print(f"Total TTC: {invoice_data['total_ttc']}€")
print(f"Date: {invoice_data['date']}")

# Scores de confiance
confidence = result["confidence_scores"]
print(f"Confiance numéro: {confidence['invoice_number']}")

Depuis une image base64

import base64

with open("facture.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode('utf-8')

result = api.extract_from_base64(image_data, language="fra")

Traitement par lot (batch)

# Traiter plusieurs factures en une requête
files = ["facture1.pdf", "facture2.pdf", "facture3.pdf"]
batch_result = api.batch_extract(files, language="fra")

# Parcourir les résultats
for i, result in enumerate(batch_result["results"]):
    if result["success"]:
        print(f"Facture {i+1}: {result['extracted_data']['invoice_number']}")

Validation de conformité FR

# Extraire puis valider
result = api.extract_from_file("facture.pdf", check_compliance=True)

# Ou valider séparément
invoice_data = result["extracted_data"]
compliance = api.check_compliance(invoice_data)

if compliance["compliance"]["compliant"]:
    print("✅ Facture conforme")
else:
    print(f"❌ Champs manquants: {compliance['compliance']['missing_fields']}")

Génération Factur-X

# Extraire les données
result = api.extract_from_file("facture.pdf")
invoice_data = result["extracted_data"]

# Générer XML Factur-X
facturx_result = api.generate_facturx(invoice_data)
xml_content = facturx_result["xml"]

# Sauvegarder le XML
with open("facture_facturx.xml", "w", encoding="utf-8") as f:
    f.write(xml_content)

Validation TVA

invoice_data = result["extracted_data"]
vat_validation = api.validate_vat(invoice_data)

if vat_validation["validation"]["valid"]:
    print("✅ TVA valide")
else:
    print(f"❌ Erreurs: {vat_validation['validation']['errors']}")

Enrichissement SIRET

# Enrichir avec API Sirene
enrichment = api.enrich_siret("12345678901234")
print(f"Raison sociale: {enrichment['enrichment']['uniteLegale']['denominationUniteLegale']}")

Gestion des erreurs

from ocr_facture_api import (
    OCRFactureAPIError,
    OCRFactureAuthError,
    OCRFactureRateLimitError,
    OCRFactureValidationError,
)

try:
    result = api.extract_from_file("facture.pdf")
except OCRFactureAuthError:
    print("❌ Clé API invalide")
except OCRFactureRateLimitError as e:
    print(f"❌ Quota dépassé. Réessayez dans {e.retry_after} secondes")
except OCRFactureValidationError as e:
    print(f"❌ Erreur de validation: {e.message}")
except OCRFactureAPIError as e:
    print(f"❌ Erreur API: {e.message}")

Idempotence

import uuid

# Utiliser une clé d'idempotence pour éviter les doublons
idempotency_key = str(uuid.uuid4())
result = api.extract_from_file(
    "facture.pdf",
    idempotency_key=idempotency_key
)

# Réutiliser la même clé retourne le même résultat sans retraitement
result2 = api.extract_from_file(
    "facture.pdf",
    idempotency_key=idempotency_key
)  # Résultat instantané depuis le cache

📚 API Reference

Méthodes principales

  • extract_from_file(file_path, language="fra", check_compliance=False, idempotency_key=None) - Extraction depuis fichier
  • extract_from_base64(base64_string, language="fra", check_compliance=False, idempotency_key=None) - Extraction depuis base64
  • batch_extract(files, language="fra", idempotency_key=None) - Traitement par lot (max 10 fichiers)
  • check_compliance(invoice_data) - Validation conformité FR
  • validate_vat(invoice_data) - Validation TVA
  • enrich_siret(siret) - Enrichissement SIRET
  • validate_vies(vat_number) - Validation VIES
  • generate_facturx(invoice_data) - Génération XML Factur-X
  • parse_facturx(file_path) - Extraction XML depuis PDF/A-3
  • validate_facturx_xml(xml_content) - Validation XML Factur-X
  • get_supported_languages() - Liste des langues supportées
  • get_quota() - Informations sur quota restant
  • health_check() - État de santé de l'API

🌍 Langues supportées

  • fra - Français (défaut)
  • eng - Anglais
  • deu - Allemand
  • spa - Espagnol
  • ita - Italien
  • por - Portugais

📝 Exemples complets

Exemple 1 : Traitement automatique de factures

from ocr_facture_api import OCRFactureAPI
import os

api = OCRFactureAPI(api_key=os.getenv("OCR_FACTURE_API_KEY"))

# Traiter toutes les factures d'un dossier
factures_dir = "./factures"
for filename in os.listdir(factures_dir):
    if filename.endswith(('.pdf', '.jpg', '.png')):
        filepath = os.path.join(factures_dir, filename)
        try:
            result = api.extract_from_file(filepath, check_compliance=True)
            
            invoice_data = result["extracted_data"]
            print(f"\n📄 {filename}")
            print(f"  Numéro: {invoice_data.get('invoice_number')}")
            print(f"  Date: {invoice_data.get('date')}")
            print(f"  Total TTC: {invoice_data.get('total_ttc')}€")
            
            # Vérifier conformité
            if result.get("compliance", {}).get("compliant"):
                print("  ✅ Conforme")
            else:
                print(f"  ⚠️ Non conforme: {result['compliance']['missing_fields']}")
                
        except Exception as e:
            print(f"❌ Erreur pour {filename}: {e}")

Exemple 2 : Export vers CSV

import csv
from ocr_facture_api import OCRFactureAPI

api = OCRFactureAPI(api_key="votre_cle")

# Traiter plusieurs factures
files = ["facture1.pdf", "facture2.pdf", "facture3.pdf"]
batch_result = api.batch_extract(files)

# Exporter vers CSV
with open("factures_export.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.writer(f)
    writer.writerow(["Numéro", "Date", "Vendeur", "Total TTC", "Confiance"])
    
    for result in batch_result["results"]:
        if result["success"]:
            data = result["extracted_data"]
            writer.writerow([
                data.get("invoice_number"),
                data.get("date"),
                data.get("vendor"),
                data.get("total_ttc"),
                result["confidence_scores"].get("total_ttc", 0)
            ])

🔗 Liens utiles

📄 Licence

MIT License

🤝 Contribution

Les contributions sont les bienvenues ! N'hésitez pas à ouvrir une issue ou une pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocr_facture_api-2.0.0.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ocr_facture_api-2.0.0-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file ocr_facture_api-2.0.0.tar.gz.

File metadata

  • Download URL: ocr_facture_api-2.0.0.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for ocr_facture_api-2.0.0.tar.gz
Algorithm Hash digest
SHA256 aa270eb4400f8a04e313e0ac4e9c63c4264a73e18b9ada68130e4b0945a2435d
MD5 df306507b1e514ab59d4a416753bcea5
BLAKE2b-256 ce1de984690a19095644b91955d08b5989cb6ea24c56091050d8413ac223be0b

See more details on using hashes here.

File details

Details for the file ocr_facture_api-2.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ocr_facture_api-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1b0a2d6cdb1ca337fc6db5c3bef885579b265958bb58f1609e3591778079c64a
MD5 e9567b58673e37d8686a65ebe0fb32ee
BLAKE2b-256 6c70e8d6fde43e89b40ff2ac905ff3ff71ae093a13dd6f19e6f73502a9acb0aa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page