Skip to main content

Librairie outils IA Lexia par Lexfluent

Project description

Libraire python Lexfluent RevolutionAI

Création/Révision Auteur date
Création Jacques MASSA 2 décembre 2024
Modification jacques MASSA 10 mars 2025

Présentation

La librairie pyLexfluent propose toutes les fonctionnalités IA dans les domaines juridique et document.

  • Classification : Entraînement et inférence
  • Extraction de données : ODP, CNI, IBAN, Document juridique, Certificat d'Urbanisme, Extrait Acte de naissance, Extrait Acte de Décés,Extrait Acte de Mariage
  • Augmentation des données : Finance

Installations Prérequises

pip install setuptools
pip install wheel
pip install scikit-learn
pip install matplotlib
pip install tqdm
pip install pytesseract
pip install pillow>=10.1.0
pip install jax>=0.4.38
pip install jaxlib>=0.4.38
pip install mediapipe
pip install opencv-python
pip install pandas
pip install tensorrt
pip install tensorrt-lean
pip install tensorrt-dispatch
pip install tensorflow
pip install tf-keras
pip install tensorflow-hub
pip install torch torchvision torchaudio
pip install sentence_transformers
pip install spacy[cuda12x]
pip install ocrmypdf
pip install easyocr
pip install pdf2image
pip install pdfplumber
pip install langchain-community
pip install pymongo
python -m spacy download fr_core_news_lg

Il y peut y avoir un conflit de version avec cuDNN requis par TensforFlow et Torch Dans ce cas il faut supprimer nvidia-cuDNN-cu12 apporté par PIP

pip uninstall nvidia-cudnn-cu12

Téléchargement modèles

SPACY

python -m spacy download fr_core_news_lg

Update et installations requises

    apt-get update 
    apt-get upgrade
    apt install software-properties-common -y
    apt-get install poppler-utils -y
    add-apt-repository ppa:alex-p/tesseract-ocr5
    apt-get install libc6 -y
    apt-get install poppler-utils -y
    apt-get install tesseract-ocr -y
    apt-get install tesseract-ocr-fra -y
    apt-get install tesseract-ocr-eng -y
    apt-get install tesseract-ocr-ita -y
    apt-get install tesseract-ocr-spa -y
    apt-get install tesseract-ocr-deu -y
    apt-get install tesseract-ocr-cos -y
    apt-get install tesseract-ocr-lat -y
    apt-get install automake libtool -y
    apt-get install libleptonica-dev -y
    apt-get install ffmpeg libsm6 libxext6  -y
    apt-get install ocrmypdf -y    

GPU issue

Si problème : Successful NUMA node read from SysFS had negative value (-1)

for a in /sys/bus/pci/devices/*; do echo 0 |  tee -a $a/numa_node; done

Exemples d'utilisation

Classification

Code

import logging
import sys

from lxf.services.measure_time import measure_time_async
from lxf.services.try_safe import try_safe_execute_asyncio



from lxf.ai.classification.classifier import get_classification
from lxf.domain.predictions import  Predictions

import lxf.settings as settings 
from lxf.settings import set_looging_level, get_logging_level
set_logging_level(logging.DEBUG)
###################################################################

logger = logging.getLogger('test classifier')
fh = logging.FileHandler('./logs/test_classifier.log')
fh.setLevel(get_logging_level())
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh.setFormatter(formatter)
logger.setLevel(get_logging_level())
logger.addHandler(fh)
#################################################################

@measure_time_async
async def do_test(file_name) -> Predictions :
    """
    """
    return await get_classification(file_name=file_name,max_pages=10)


if __name__ == "__main__":
    sys.stdout.reconfigure(line_buffering=True) 
    pdf_path = "data/ODP.pdf"
    iban_pdf="data/RIBB.pdf"
    result = try_safe_execute_asyncio(logger=logger,func=do_test,file_name=iban_pdf) #asyncio.run(do_test(iban_pdf))
    print(result)    
    result = try_safe_execute_asyncio(logger=logger,func=do_test,file_name=pdf_path) #asyncio.run(do_test(pdf_path))
    print(result)

Code

import logging
import asyncio
import os
import sys



import lxf.settings as settings
from lxf.setting import set_logging_level, get_logging_level
set_logging_level(logging.DEBUG)
settings.enable_tqdm=False

from lxf.domain.loan import Pret
from lxf.extractors.finance import odp_extractor
from lxf.extractors.finance import iban_extractor

from lxf.services.try_safe import  try_safe_execute_async



###################################################################

logger = logging.getLogger('test_finance')
fh = logging.FileHandler('./logs/test_finance.log')
fh.setLevel(get_logging_level())
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh.setFormatter(formatter)
logger.setLevel(get_logging_level())
logger.addHandler(fh)
#################################################################

async def do_test_odp(file_path:str)->Pret:
    result = await try_safe_execute_async(logger,odp_extractor.extract_data,file_path=file_path)
    return result
    
async def do_test_iban(file_path:str)->str :
    """
    """
    result = await try_safe_execute_async(logger,iban_extractor.extract_data,file_path=file_path)
    return result

if __name__ == "__main__":
    sys.stdout.reconfigure(line_buffering=True) 
    pdf_path = "data/ODP.pdf"
    # pret:Pret=  asyncio.run(do_test_odp(file_path=pdf_path))
    # if pret!=None:
    #     print(pret.emprunteurs)
    iban_pdf="data/rib pm.pdf"
    txt = asyncio.run(do_test_iban(file_path=iban_pdf))
    print(txt)
    

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pylexfluent-0.1.48.tar.gz (96.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pylexfluent-0.1.48-py3-none-any.whl (115.1 kB view details)

Uploaded Python 3

File details

Details for the file pylexfluent-0.1.48.tar.gz.

File metadata

  • Download URL: pylexfluent-0.1.48.tar.gz
  • Upload date:
  • Size: 96.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.3

File hashes

Hashes for pylexfluent-0.1.48.tar.gz
Algorithm Hash digest
SHA256 eebd4b738cd8358513531d5e0fb77445618e577aa9246f94a2db746275f8eb9b
MD5 381fced8ee74449747cde999f00943eb
BLAKE2b-256 f13bcd745c4848a6bcfe26e6af45b3fb73cecf0def92bd93a38d5d74e3e25fac

See more details on using hashes here.

File details

Details for the file pylexfluent-0.1.48-py3-none-any.whl.

File metadata

  • Download URL: pylexfluent-0.1.48-py3-none-any.whl
  • Upload date:
  • Size: 115.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.3

File hashes

Hashes for pylexfluent-0.1.48-py3-none-any.whl
Algorithm Hash digest
SHA256 8e5201fb73f0c0da1344ac67d3186eea25f5d63dd6bc325223bfea9ba7df83c4
MD5 30ff49e9fac3c00150d9652b80220fa3
BLAKE2b-256 7031091eea62d451ff52d7dfa20229507bc6052bf85febd573a8eae64226dec0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page