Librairie outils IA Lexia par Lexfluent
Project description
Libraire python Lexfluent RevolutionAI
| Création/Révision | Auteur | date |
|---|---|---|
| Création | Jacques MASSA | 2 décembre 2024 |
| Modification | jacques MASSA | 10 mars 2025 |
| Modification | jacques MASSA | 4 janvier 2026 |
Présentation
La librairie pyLexfluent propose toutes les fonctionnalités IA dans les domaines juridique et document.
- Classification : Entraînement et inférence
- Extraction de données : ODP, CNI, IBAN, Document juridique, Certificat d'Urbanisme, Extrait Acte de naissance, Extrait Acte de Décés,Extrait Acte de Mariage
- Augmentation des données : Finance
Installations Prérequises
"pip install setuptools",
"pip install wheel",
"pip install scikit-learn",
"pip install matplotlib",
"pip install tqdm",
"pip install pytesseract ",
"pip install pillow>=10.1.0",
"pip install jax==0.4.38",
"pip install jaxlib==0.4.38",
"pip install mediapipe",
"pip install opencv-python",
"pip install pandas",
"pip install tensorrt",
"pip install tensorrt-lean",
"pip install tensorrt-dispatch",
"pip install tensorflow",
"pip install tf-keras",
"pip install tensorflow-hub",
"pip install torch",
"pip install torchvision",
"pip install torchaudio",
"pip install sentence-transformers",
"pip install spacy[cuda12x]",
"pip install ocrmypdf",
"pip install pdf2image",
"pip install pdfplumber",
"pip install langchain-community",
"pip install langchain-ollama",
"pip install langchain-openai",
"pip install pymongo",
"pip install openpyxl",
"pip install easyocr",
"pip install docling[all]"
python -m spacy download fr_core_news_lg
Il y peut y avoir un conflit de version avec cuDNN requis par TensforFlow et Torch Dans ce cas il faut supprimer nvidia-cuDNN-cu12 apporté par PIP
pip uninstall nvidia-cudnn-cu12
Prerequis système
Update et installations requises
apt-get update
apt-get upgrade
apt install software-properties-common -y
apt-get install poppler-utils -y
add-apt-repository ppa:alex-p/tesseract-ocr5
apt-get install libc6 -y
apt-get install poppler-utils -y
apt-get install tesseract-ocr -y
apt-get install tesseract-ocr-fra -y
apt-get install tesseract-ocr-eng -y
apt-get install tesseract-ocr-ita -y
apt-get install tesseract-ocr-spa -y
apt-get install tesseract-ocr-deu -y
apt-get install tesseract-ocr-cos -y
apt-get install tesseract-ocr-lat -y
apt-get install automake libtool -y
apt-get install libleptonica-dev -y
apt-get install ffmpeg libsm6 libxext6 -y
apt-get install ocrmypdf -y
JBIG2
Installing the JBIG2 encoder Most Linux distributions do not include a JBIG2 encoder since JBIG2 encoding was patented for a long time. All known JBIG2 US patents have expired as of 2017, but it is possible that unknown patents exist.
JBIG2 encoding is recommended for OCRmyPDF and is used to losslessly create smaller PDFs. If JBIG2 encoding is not available, lower quality CCITT encoding will be used for monochrome images.
JBIG2 decoding is not patented and is performed automatically by most PDF viewers. It is widely supported and has been part of the PDF specification since 2001.
JBIG encoding is automatically provided by these OCRmyPDF packages: - Docker image (both Ubuntu and Alpine) - Snap package - ArchLinux AUR package - Alpine Linux package - Homebrew on macOS
For all other platforms, you would need to build the JBIG2 encoder from source:
git clone https://github.com/agl/jbig2enc
cd jbig2enc
./autogen.sh
./configure && make
[sudo] make install
Dependencies include libtoolize and libleptonica, which on Ubuntu systems are packaged as libtool and libleptonica-dev. On Fedora (35) they are packaged as libtool and leptonica-devel. For this to work, please make sure to install autotools, automake, libtool, pkg-config and leptonica first if not already installed. Other dependencies might be required depending on your system.
[sudo] apt install autotools-dev automake libtool libleptonica-dev pkg-config
Téléchargement modèles
SPACY
python -m spacy download fr_core_news_lg
GPU issue
Si problème : Successful NUMA node read from SysFS had negative value (-1)
for a in /sys/bus/pci/devices/*; do echo 0 | tee -a $a/numa_node; done
Exemples d'utilisation
Classification
Code
import logging
import sys
from lxf.services.measure_time import measure_time_async
from lxf.services.try_safe import try_safe_execute_asyncio
from lxf.ai.classification.classifier import get_classification
from lxf.domain.predictions import Predictions
import lxf.settings as settings
from lxf.settings import set_looging_level, get_logging_level
set_logging_level(logging.DEBUG)
###################################################################
logger = logging.getLogger('test classifier')
fh = logging.FileHandler('./logs/test_classifier.log')
fh.setLevel(get_logging_level())
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh.setFormatter(formatter)
logger.setLevel(get_logging_level())
logger.addHandler(fh)
#################################################################
@measure_time_async
async def do_test(file_name) -> Predictions :
"""
"""
return await get_classification(file_name=file_name,max_pages=10)
if __name__ == "__main__":
sys.stdout.reconfigure(line_buffering=True)
pdf_path = "data/ODP.pdf"
iban_pdf="data/RIBB.pdf"
result = try_safe_execute_asyncio(logger=logger,func=do_test,file_name=iban_pdf) #asyncio.run(do_test(iban_pdf))
print(result)
result = try_safe_execute_asyncio(logger=logger,func=do_test,file_name=pdf_path) #asyncio.run(do_test(pdf_path))
print(result)
Code
import logging
import asyncio
import os
import sys
import lxf.settings as settings
from lxf.setting import set_logging_level, get_logging_level
set_logging_level(logging.DEBUG)
settings.enable_tqdm=False
from lxf.domain.loan import Pret
from lxf.extractors.finance import odp_extractor
from lxf.extractors.finance import iban_extractor
from lxf.services.try_safe import try_safe_execute_async
###################################################################
logger = logging.getLogger('test_finance')
fh = logging.FileHandler('./logs/test_finance.log')
fh.setLevel(get_logging_level())
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh.setFormatter(formatter)
logger.setLevel(get_logging_level())
logger.addHandler(fh)
#################################################################
async def do_test_odp(file_path:str)->Pret:
result = await try_safe_execute_async(logger,odp_extractor.extract_data,file_path=file_path)
return result
async def do_test_iban(file_path:str)->str :
"""
"""
result = await try_safe_execute_async(logger,iban_extractor.extract_data,file_path=file_path)
return result
if __name__ == "__main__":
sys.stdout.reconfigure(line_buffering=True)
pdf_path = "data/ODP.pdf"
# pret:Pret= asyncio.run(do_test_odp(file_path=pdf_path))
# if pret!=None:
# print(pret.emprunteurs)
iban_pdf="data/rib pm.pdf"
txt = asyncio.run(do_test_iban(file_path=iban_pdf))
print(txt)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pylexfluent-0.1.84.tar.gz.
File metadata
- Download URL: pylexfluent-0.1.84.tar.gz
- Upload date:
- Size: 118.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86bc5bdc5a05ce222979259aa687f06189e7f6b323a2c9d6f83bbc61c612d67f
|
|
| MD5 |
58bad2c4b5fe47225ef2e05f5d82d2b5
|
|
| BLAKE2b-256 |
886fc6cc49ae24ceb0f44ce5db85bcf3fc8fc79ce3659955299e3d10c5fd69d2
|
File details
Details for the file pylexfluent-0.1.84-py3-none-any.whl.
File metadata
- Download URL: pylexfluent-0.1.84-py3-none-any.whl
- Upload date:
- Size: 143.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19b5a9678509f9c65135ce5658bcff4933ca5cce12c9eda1edbe7355448da4a0
|
|
| MD5 |
c0c426e4ba9222ec6bc549f55e3d3315
|
|
| BLAKE2b-256 |
a28413e5c83f31559f0c78c884cfb36da1ad5ff17de56738e34565979dbad0f2
|