Skip to main content

Librairie outils IA Lexia par Lexfluent

Project description

Libraire python Lexfluent RevolutionAI

Création/Révision Auteur date
Création Jacques MASSA 2 décembre 2024
Modification jacques MASSA 10 mars 2025

Présentation

La librairie pyLexfluent propose toutes les fonctionnalités IA dans les domaines juridique et document.

  • Classification : Entraînement et inférence
  • Extraction de données : ODP, CNI, IBAN, Document juridique, Certificat d'Urbanisme, Extrait Acte de naissance, Extrait Acte de Décés,Extrait Acte de Mariage
  • Augmentation des données : Finance

Installations Prérequises

pip install setuptools
pip install wheel
pip install scikit-learn
pip install matplotlib
pip install tqdm
pip install pytesseract
pip install pillow>=10.1.0
pip install jax>=0.4.38
pip install jaxlib>=0.4.38
pip install mediapipe
pip install opencv-python
pip install pandas
pip install tensorrt
pip install tensorrt-lean
pip install tensorrt-dispatch
pip install tensorflow
pip install tf-keras
pip install tensorflow-hub
pip install torch torchvision torchaudio
pip install sentence_transformers
pip install spacy[cuda12x]
pip install ocrmypdf
pip install easyocr
pip install pdf2image
pip install pdfplumber
pip install langchain-community
pip install pymongo
python -m spacy download fr_core_news_lg

Il y peut y avoir un conflit de version avec cuDNN requis par TensforFlow et Torch Dans ce cas il faut supprimer nvidia-cuDNN-cu12 apporté par PIP

pip uninstall nvidia-cudnn-cu12

Téléchargement modèles

SPACY

python -m spacy download fr_core_news_lg

Update et installations requises

    apt-get update 
    apt-get upgrade
    apt install software-properties-common -y
    apt-get install poppler-utils -y
    add-apt-repository ppa:alex-p/tesseract-ocr5
    apt-get install libc6 -y
    apt-get install poppler-utils -y
    apt-get install tesseract-ocr -y
    apt-get install tesseract-ocr-fra -y
    apt-get install tesseract-ocr-eng -y
    apt-get install tesseract-ocr-ita -y
    apt-get install tesseract-ocr-spa -y
    apt-get install tesseract-ocr-deu -y
    apt-get install tesseract-ocr-cos -y
    apt-get install tesseract-ocr-lat -y
    apt-get install automake libtool -y
    apt-get install libleptonica-dev -y
    apt-get install ffmpeg libsm6 libxext6  -y
    apt-get install ocrmypdf -y    

GPU issue

Si problème : Successful NUMA node read from SysFS had negative value (-1)

for a in /sys/bus/pci/devices/*; do echo 0 |  tee -a $a/numa_node; done

Exemples d'utilisation

Classification

Code

import logging
import sys

from lxf.services.measure_time import measure_time_async
from lxf.services.try_safe import try_safe_execute_asyncio



from lxf.ai.classification.classifier import get_classification
from lxf.domain.predictions import  Predictions

import lxf.settings as settings 
from lxf.settings import set_looging_level, get_logging_level
set_logging_level(logging.DEBUG)
###################################################################

logger = logging.getLogger('test classifier')
fh = logging.FileHandler('./logs/test_classifier.log')
fh.setLevel(get_logging_level())
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh.setFormatter(formatter)
logger.setLevel(get_logging_level())
logger.addHandler(fh)
#################################################################

@measure_time_async
async def do_test(file_name) -> Predictions :
    """
    """
    return await get_classification(file_name=file_name,max_pages=10)


if __name__ == "__main__":
    sys.stdout.reconfigure(line_buffering=True) 
    pdf_path = "data/ODP.pdf"
    iban_pdf="data/RIBB.pdf"
    result = try_safe_execute_asyncio(logger=logger,func=do_test,file_name=iban_pdf) #asyncio.run(do_test(iban_pdf))
    print(result)    
    result = try_safe_execute_asyncio(logger=logger,func=do_test,file_name=pdf_path) #asyncio.run(do_test(pdf_path))
    print(result)

Code

import logging
import asyncio
import os
import sys



import lxf.settings as settings
from lxf.setting import set_logging_level, get_logging_level
set_logging_level(logging.DEBUG)
settings.enable_tqdm=False

from lxf.domain.loan import Pret
from lxf.extractors.finance import odp_extractor
from lxf.extractors.finance import iban_extractor

from lxf.services.try_safe import  try_safe_execute_async



###################################################################

logger = logging.getLogger('test_finance')
fh = logging.FileHandler('./logs/test_finance.log')
fh.setLevel(get_logging_level())
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh.setFormatter(formatter)
logger.setLevel(get_logging_level())
logger.addHandler(fh)
#################################################################

async def do_test_odp(file_path:str)->Pret:
    result = await try_safe_execute_async(logger,odp_extractor.extract_data,file_path=file_path)
    return result
    
async def do_test_iban(file_path:str)->str :
    """
    """
    result = await try_safe_execute_async(logger,iban_extractor.extract_data,file_path=file_path)
    return result

if __name__ == "__main__":
    sys.stdout.reconfigure(line_buffering=True) 
    pdf_path = "data/ODP.pdf"
    # pret:Pret=  asyncio.run(do_test_odp(file_path=pdf_path))
    # if pret!=None:
    #     print(pret.emprunteurs)
    iban_pdf="data/rib pm.pdf"
    txt = asyncio.run(do_test_iban(file_path=iban_pdf))
    print(txt)
    

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pylexfluent-0.1.72.tar.gz (108.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pylexfluent-0.1.72-py3-none-any.whl (131.0 kB view details)

Uploaded Python 3

File details

Details for the file pylexfluent-0.1.72.tar.gz.

File metadata

  • Download URL: pylexfluent-0.1.72.tar.gz
  • Upload date:
  • Size: 108.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.3

File hashes

Hashes for pylexfluent-0.1.72.tar.gz
Algorithm Hash digest
SHA256 d28ee5a7fe4a37c5921b5abf5a936379bca7c80c0c4c5ac1e49bafb21722f970
MD5 a476c9436e56df7dbdb02c2eaade53bf
BLAKE2b-256 66d8ee3f5f348468b569ff0bfd1023004cbf208448bfa28f3d0db1f19d1a04d8

See more details on using hashes here.

File details

Details for the file pylexfluent-0.1.72-py3-none-any.whl.

File metadata

  • Download URL: pylexfluent-0.1.72-py3-none-any.whl
  • Upload date:
  • Size: 131.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.3

File hashes

Hashes for pylexfluent-0.1.72-py3-none-any.whl
Algorithm Hash digest
SHA256 ceb3d3107ec08556b7dabbb583a7df46a402b5f7e9f518a4791739c42d625691
MD5 f1ea1b2be9deb58a1a9dbbcc4bd2f788
BLAKE2b-256 8748bf610c2d176e7617dac8d40e61a68dee78f0a1f4a8d1d7ee679749004f04

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page