Skip to main content

Librairie outils IA Lexia par Lexfluent

Project description

Libraire python Lexfluent RevolutionAI

Création/Révision Auteur date
Création Jacques MASSA 2 décembre 2024
Modification jacques MASSA 10 mars 2025
Modification jacques MASSA 4 janvier 2026

Présentation

La librairie pyLexfluent propose toutes les fonctionnalités IA dans les domaines juridique et document.

  • Classification : Entraînement et inférence
  • Extraction de données : ODP, CNI, IBAN, Document juridique, Certificat d'Urbanisme, Extrait Acte de naissance, Extrait Acte de Décés,Extrait Acte de Mariage
  • Augmentation des données : Finance

Installations Prérequises

"pip install setuptools",
"pip install wheel",
"pip install scikit-learn",
"pip install matplotlib",
"pip install tqdm",
"pip install pytesseract ",
"pip install pillow>=10.1.0",
"pip install jax==0.4.38",
"pip install jaxlib==0.4.38",
"pip install mediapipe",
"pip install opencv-python", 
"pip install pandas",
"pip install tensorrt",
"pip install tensorrt-lean",
"pip install tensorrt-dispatch",
"pip install tensorflow",
"pip install tf-keras",
"pip install tensorflow-hub",
"pip install torch",
"pip install torchvision",
"pip install torchaudio",
"pip install sentence-transformers",
"pip install spacy[cuda12x]",
"pip install ocrmypdf",
"pip install pdf2image",
"pip install pdfplumber",
"pip install langchain-community",
"pip install langchain-ollama",
"pip install langchain-openai",
"pip install pymongo",
"pip install openpyxl",
"pip install easyocr",
"pip install docling[all]"
python -m spacy download fr_core_news_lg

Il y peut y avoir un conflit de version avec cuDNN requis par TensforFlow et Torch Dans ce cas il faut supprimer nvidia-cuDNN-cu12 apporté par PIP

pip uninstall nvidia-cudnn-cu12

Prerequis système

Update et installations requises

    apt-get update 
    apt-get upgrade
    apt install software-properties-common -y
    apt-get install poppler-utils -y
    add-apt-repository ppa:alex-p/tesseract-ocr5
    apt-get install libc6 -y
    apt-get install poppler-utils -y
    apt-get install tesseract-ocr -y
    apt-get install tesseract-ocr-fra -y
    apt-get install tesseract-ocr-eng -y
    apt-get install tesseract-ocr-ita -y
    apt-get install tesseract-ocr-spa -y
    apt-get install tesseract-ocr-deu -y
    apt-get install tesseract-ocr-cos -y
    apt-get install tesseract-ocr-lat -y
    apt-get install automake libtool -y
    apt-get install libleptonica-dev -y
    apt-get install ffmpeg libsm6 libxext6  -y
    apt-get install ocrmypdf -y    

JBIG2

Installing the JBIG2 encoder Most Linux distributions do not include a JBIG2 encoder since JBIG2 encoding was patented for a long time. All known JBIG2 US patents have expired as of 2017, but it is possible that unknown patents exist.

JBIG2 encoding is recommended for OCRmyPDF and is used to losslessly create smaller PDFs. If JBIG2 encoding is not available, lower quality CCITT encoding will be used for monochrome images.

JBIG2 decoding is not patented and is performed automatically by most PDF viewers. It is widely supported and has been part of the PDF specification since 2001.

JBIG encoding is automatically provided by these OCRmyPDF packages: - Docker image (both Ubuntu and Alpine) - Snap package - ArchLinux AUR package - Alpine Linux package - Homebrew on macOS

For all other platforms, you would need to build the JBIG2 encoder from source:

git clone https://github.com/agl/jbig2enc
cd jbig2enc
./autogen.sh
./configure && make
[sudo] make install

Dependencies include libtoolize and libleptonica, which on Ubuntu systems are packaged as libtool and libleptonica-dev. On Fedora (35) they are packaged as libtool and leptonica-devel. For this to work, please make sure to install autotools, automake, libtool, pkg-config and leptonica first if not already installed. Other dependencies might be required depending on your system.

[sudo] apt install autotools-dev automake libtool libleptonica-dev pkg-config

Téléchargement modèles

SPACY

python -m spacy download fr_core_news_lg

GPU issue

Si problème : Successful NUMA node read from SysFS had negative value (-1)

for a in /sys/bus/pci/devices/*; do echo 0 |  tee -a $a/numa_node; done

Exemples d'utilisation

Classification

Code

import logging
import sys

from lxf.services.measure_time import measure_time_async
from lxf.services.try_safe import try_safe_execute_asyncio



from lxf.ai.classification.classifier import get_classification
from lxf.domain.predictions import  Predictions

import lxf.settings as settings 
from lxf.settings import set_looging_level, get_logging_level
set_logging_level(logging.DEBUG)
###################################################################

logger = logging.getLogger('test classifier')
fh = logging.FileHandler('./logs/test_classifier.log')
fh.setLevel(get_logging_level())
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh.setFormatter(formatter)
logger.setLevel(get_logging_level())
logger.addHandler(fh)
#################################################################

@measure_time_async
async def do_test(file_name) -> Predictions :
    """
    """
    return await get_classification(file_name=file_name,max_pages=10)


if __name__ == "__main__":
    sys.stdout.reconfigure(line_buffering=True) 
    pdf_path = "data/ODP.pdf"
    iban_pdf="data/RIBB.pdf"
    result = try_safe_execute_asyncio(logger=logger,func=do_test,file_name=iban_pdf) #asyncio.run(do_test(iban_pdf))
    print(result)    
    result = try_safe_execute_asyncio(logger=logger,func=do_test,file_name=pdf_path) #asyncio.run(do_test(pdf_path))
    print(result)

Code

import logging
import asyncio
import os
import sys



import lxf.settings as settings
from lxf.setting import set_logging_level, get_logging_level
set_logging_level(logging.DEBUG)
settings.enable_tqdm=False

from lxf.domain.loan import Pret
from lxf.extractors.finance import odp_extractor
from lxf.extractors.finance import iban_extractor

from lxf.services.try_safe import  try_safe_execute_async



###################################################################

logger = logging.getLogger('test_finance')
fh = logging.FileHandler('./logs/test_finance.log')
fh.setLevel(get_logging_level())
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh.setFormatter(formatter)
logger.setLevel(get_logging_level())
logger.addHandler(fh)
#################################################################

async def do_test_odp(file_path:str)->Pret:
    result = await try_safe_execute_async(logger,odp_extractor.extract_data,file_path=file_path)
    return result
    
async def do_test_iban(file_path:str)->str :
    """
    """
    result = await try_safe_execute_async(logger,iban_extractor.extract_data,file_path=file_path)
    return result

if __name__ == "__main__":
    sys.stdout.reconfigure(line_buffering=True) 
    pdf_path = "data/ODP.pdf"
    # pret:Pret=  asyncio.run(do_test_odp(file_path=pdf_path))
    # if pret!=None:
    #     print(pret.emprunteurs)
    iban_pdf="data/rib pm.pdf"
    txt = asyncio.run(do_test_iban(file_path=iban_pdf))
    print(txt)
    

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pylexfluent-0.1.85.tar.gz (118.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pylexfluent-0.1.85-py3-none-any.whl (143.7 kB view details)

Uploaded Python 3

File details

Details for the file pylexfluent-0.1.85.tar.gz.

File metadata

  • Download URL: pylexfluent-0.1.85.tar.gz
  • Upload date:
  • Size: 118.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.3

File hashes

Hashes for pylexfluent-0.1.85.tar.gz
Algorithm Hash digest
SHA256 8cf91ee20560081e7200a0456775e8e0c184d1089b6925d3abba61abfb32a6fe
MD5 763b131e371e3eb45ae57d29062f7865
BLAKE2b-256 7e6f002804b0a31dd808284b29cc4d89ee9ffa09a37bbb1167d5651585532db3

See more details on using hashes here.

File details

Details for the file pylexfluent-0.1.85-py3-none-any.whl.

File metadata

  • Download URL: pylexfluent-0.1.85-py3-none-any.whl
  • Upload date:
  • Size: 143.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.3

File hashes

Hashes for pylexfluent-0.1.85-py3-none-any.whl
Algorithm Hash digest
SHA256 2041eadb0653ff694317d81a718e831435edd81fc41d1c052395c63e035f6420
MD5 b2b09738242413411c6106aab5eb6fd7
BLAKE2b-256 20948e35bcb1aefcd779df438a9216a6494b2408b419d5ab89cf8cc072e36e98

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page