Skip to main content

Librairie outils IA Lexia par Lexfluent

Project description

Libraire python Lexfluent RevolutionAI

Création/Révision Auteur date
Création Jacques MASSA 2 décembre 2024
Modification jacques MASSA 10 mars 2025

Présentation

La librairie pyLexfluent propose toutes les fonctionnalités IA dans les domaines juridique et document.

  • Classification : Entraînement et inférence
  • Extraction de données : ODP, CNI, IBAN, Document juridique, Certificat d'Urbanisme, Extrait Acte de naissance, Extrait Acte de Décés,Extrait Acte de Mariage
  • Augmentation des données : Finance

Installations Prérequises

pip install setuptools
pip install wheel
pip install scikit-learn
pip install matplotlib
pip install tqdm
pip install pytesseract
pip install pillow>=10.1.0
pip install jax>=0.4.38
pip install jaxlib>=0.4.38
pip install mediapipe
pip install opencv-python
pip install pandas
pip install tensorrt
pip install tensorrt-lean
pip install tensorrt-dispatch
pip install tensorflow
pip install tf-keras
pip install tensorflow-hub
pip install torch torchvision torchaudio
pip install sentence_transformers
pip install spacy[cuda12x]
pip install ocrmypdf
pip install easyocr
pip install pdf2image
pip install pdfplumber
pip install langchain-community
pip install pymongo
python -m spacy download fr_core_news_lg

Il y peut y avoir un conflit de version avec cuDNN requis par TensforFlow et Torch Dans ce cas il faut supprimer nvidia-cuDNN-cu12 apporté par PIP

pip uninstall nvidia-cudnn-cu12

Téléchargement modèles

SPACY

python -m spacy download fr_core_news_lg

Update et installations requises

    apt-get update 
    apt-get upgrade
    apt install software-properties-common -y
    apt-get install poppler-utils -y
    add-apt-repository ppa:alex-p/tesseract-ocr5
    apt-get install libc6 -y
    apt-get install poppler-utils -y
    apt-get install tesseract-ocr -y
    apt-get install tesseract-ocr-fra -y
    apt-get install tesseract-ocr-eng -y
    apt-get install tesseract-ocr-ita -y
    apt-get install tesseract-ocr-spa -y
    apt-get install tesseract-ocr-deu -y
    apt-get install tesseract-ocr-cos -y
    apt-get install tesseract-ocr-lat -y
    apt-get install automake libtool -y
    apt-get install libleptonica-dev -y
    apt-get install ffmpeg libsm6 libxext6  -y
    apt-get install ocrmypdf -y    

GPU issue

Si problème : Successful NUMA node read from SysFS had negative value (-1)

for a in /sys/bus/pci/devices/*; do echo 0 |  tee -a $a/numa_node; done

Exemples d'utilisation

Classification

Code

import logging
import sys

from lxf.services.measure_time import measure_time_async
from lxf.services.try_safe import try_safe_execute_asyncio



from lxf.ai.classification.classifier import get_classification
from lxf.domain.predictions import  Predictions

import lxf.settings as settings 
from lxf.settings import set_looging_level, get_logging_level
set_logging_level(logging.DEBUG)
###################################################################

logger = logging.getLogger('test classifier')
fh = logging.FileHandler('./logs/test_classifier.log')
fh.setLevel(get_logging_level())
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh.setFormatter(formatter)
logger.setLevel(get_logging_level())
logger.addHandler(fh)
#################################################################

@measure_time_async
async def do_test(file_name) -> Predictions :
    """
    """
    return await get_classification(file_name=file_name,max_pages=10)


if __name__ == "__main__":
    sys.stdout.reconfigure(line_buffering=True) 
    pdf_path = "data/ODP.pdf"
    iban_pdf="data/RIBB.pdf"
    result = try_safe_execute_asyncio(logger=logger,func=do_test,file_name=iban_pdf) #asyncio.run(do_test(iban_pdf))
    print(result)    
    result = try_safe_execute_asyncio(logger=logger,func=do_test,file_name=pdf_path) #asyncio.run(do_test(pdf_path))
    print(result)

Code

import logging
import asyncio
import os
import sys



import lxf.settings as settings
from lxf.setting import set_logging_level, get_logging_level
set_logging_level(logging.DEBUG)
settings.enable_tqdm=False

from lxf.domain.loan import Pret
from lxf.extractors.finance import odp_extractor
from lxf.extractors.finance import iban_extractor

from lxf.services.try_safe import  try_safe_execute_async



###################################################################

logger = logging.getLogger('test_finance')
fh = logging.FileHandler('./logs/test_finance.log')
fh.setLevel(get_logging_level())
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh.setFormatter(formatter)
logger.setLevel(get_logging_level())
logger.addHandler(fh)
#################################################################

async def do_test_odp(file_path:str)->Pret:
    result = await try_safe_execute_async(logger,odp_extractor.extract_data,file_path=file_path)
    return result
    
async def do_test_iban(file_path:str)->str :
    """
    """
    result = await try_safe_execute_async(logger,iban_extractor.extract_data,file_path=file_path)
    return result

if __name__ == "__main__":
    sys.stdout.reconfigure(line_buffering=True) 
    pdf_path = "data/ODP.pdf"
    # pret:Pret=  asyncio.run(do_test_odp(file_path=pdf_path))
    # if pret!=None:
    #     print(pret.emprunteurs)
    iban_pdf="data/rib pm.pdf"
    txt = asyncio.run(do_test_iban(file_path=iban_pdf))
    print(txt)
    

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pylexfluent-0.1.70.tar.gz (108.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pylexfluent-0.1.70-py3-none-any.whl (130.8 kB view details)

Uploaded Python 3

File details

Details for the file pylexfluent-0.1.70.tar.gz.

File metadata

  • Download URL: pylexfluent-0.1.70.tar.gz
  • Upload date:
  • Size: 108.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.3

File hashes

Hashes for pylexfluent-0.1.70.tar.gz
Algorithm Hash digest
SHA256 9e64535fffe4177bc1be0d1f1927c7d9c61cf43428c1b92b1ce3894e71cd199c
MD5 5460d5e7dd370b6c0ecbf725e231de02
BLAKE2b-256 ae0fe5f91d270793d26290b8dca555743cc75f9c9ad5fb83d7bd5c6769c8db48

See more details on using hashes here.

File details

Details for the file pylexfluent-0.1.70-py3-none-any.whl.

File metadata

  • Download URL: pylexfluent-0.1.70-py3-none-any.whl
  • Upload date:
  • Size: 130.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.3

File hashes

Hashes for pylexfluent-0.1.70-py3-none-any.whl
Algorithm Hash digest
SHA256 632533244f4dbb90ab7532ca4d711542f1e84d79a372bcacb3f7e2758381c4c6
MD5 c1408376050423f0c8dad01c18b2b67a
BLAKE2b-256 49248de8728c8f690da9fcf932dadbb0d08f56771187763b2c6716940e52bec0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page