Extracteur de données de documents
Project description
Libraire python Lexfluent RevolutionAI
Auteur Jacques MASSA Créé le 2 décembre 2024
Présentation
Cette librairie permet:
- la classification de documents selon le modèle jupiterB0
- l'extraction de données contenu dans des documents de classes connues(Offre de prêts, IBAN, CNI, etc ...).
Installations Prérequises
pip install setuptools wheel
pip install pdfplumber
pip install spacy[cuda12x]
pip install tqdm
pip install opencv-python
pip install pytesseract
pip install pdf2image
pip install pillow==10.0.1
pip install pandas
pip install scikit-learn
pip install matplotlib
pip install tensorflow==2.17.0
pip install tf-keras==2.17.0
pip install tensorflow_hub
pip install tensorrt
pip install langchain-community
pip install ocrmypdf
Téléchargement modèles
SPACY
python -m spacy download fr_core_news_lg
Update et installations requises
apt-get update
apt-get upgrade
apt install software-properties-common -y
apt-get install poppler-utils -y
add-apt-repository ppa:alex-p/tesseract-ocr5
apt-get install libc6 -y
apt-get install poppler-utils -y
apt-get install tesseract-ocr -y
apt-get install tesseract-ocr-fra -y
apt-get install tesseract-ocr-eng -y
apt-get install tesseract-ocr-ita -y
apt-get install tesseract-ocr-spa -y
apt-get install tesseract-ocr-deu -y
apt-get install tesseract-ocr-cos -y
apt-get install tesseract-ocr-lat -y
apt-get install automake libtool -y
apt-get install libleptonica-dev -y
apt-get install ffmpeg libsm6 libxext6 -y
apt-get install ocrmypdf -y
GPU issue
Si problème : Successful NUMA node read from SysFS had negative value (-1)
for a in /sys/bus/pci/devices/*; do echo 0 | tee -a $a/numa_node; done
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pylexfluent-0.0.22.tar.gz
(25.6 MB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
pylexfluent-0.0.22-py3-none-any.whl
(119.2 kB
view details)
File details
Details for the file pylexfluent-0.0.22.tar.gz.
File metadata
- Download URL: pylexfluent-0.0.22.tar.gz
- Upload date:
- Size: 25.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f5f306de9e0aaeffe3e41e9b08b21fddb5390fee8f8cb9edc5b4a493b1495ce
|
|
| MD5 |
57620f0185670057d4f7ca9c7abe3f83
|
|
| BLAKE2b-256 |
011376d2aa5d0b920fb942ccbcadab6aec7b6e6394f45d4e8cd64f07c70e5645
|
File details
Details for the file pylexfluent-0.0.22-py3-none-any.whl.
File metadata
- Download URL: pylexfluent-0.0.22-py3-none-any.whl
- Upload date:
- Size: 119.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e84215ede98af1391d4c24ee2beb94ec4807662c695098b6572813a5820c367b
|
|
| MD5 |
90c802426e54b0979404eeafd7baa1d9
|
|
| BLAKE2b-256 |
cdabe57dfdb8e13134186bb0b148a925139325df18fadc270fb56952f3d90a27
|