Extracteur de données de documents
Project description
Libraire python Lexfluent RevolutionAI
Auteur Jacques MASSA Créé le 2 décembre 2024
Présentation
Cette librairie permet:
- la classification de documents selon le modèle jupiterB0
- l'extraction de données contenu dans des documents de classes connues(Offre de prêts, IBAN, CNI, etc ...).
Installations Prérequises
pip install setuptools wheel
pip install pdfplumber
pip install spacy[cuda12x]
pip install tqdm
pip install opencv-python
pip install pytesseract
pip install pdf2image
pip install pillow==10.0.1
pip install pandas
pip install scikit-learn
pip install matplotlib
pip install tensorflow==2.17.0
pip install tf-keras==2.17.0
pip install tensorflow_hub
pip install tensorrt
pip install langchain-community
pip install ocrmypdf
Téléchargement modèles
SPACY
python -m spacy download fr_core_news_lg
Update et installations requises
apt-get update
apt-get upgrade
apt install software-properties-common -y
apt-get install poppler-utils -y
add-apt-repository ppa:alex-p/tesseract-ocr5
apt-get install libc6 -y
apt-get install poppler-utils -y
apt-get install tesseract-ocr -y
apt-get install tesseract-ocr-fra -y
apt-get install tesseract-ocr-eng -y
apt-get install tesseract-ocr-ita -y
apt-get install tesseract-ocr-spa -y
apt-get install tesseract-ocr-deu -y
apt-get install tesseract-ocr-cos -y
apt-get install tesseract-ocr-lat -y
apt-get install automake libtool -y
apt-get install libleptonica-dev -y
apt-get install ffmpeg libsm6 libxext6 -y
apt-get install ocrmypdf -y
GPU issue
Si problème : Successful NUMA node read from SysFS had negative value (-1)
for a in /sys/bus/pci/devices/*; do echo 0 | tee -a $a/numa_node; done
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pylexfluent-0.0.21.tar.gz
(25.6 MB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
pylexfluent-0.0.21-py3-none-any.whl
(119.2 kB
view details)
File details
Details for the file pylexfluent-0.0.21.tar.gz.
File metadata
- Download URL: pylexfluent-0.0.21.tar.gz
- Upload date:
- Size: 25.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c15493b106c7fa19e4445b1e340f12b45cc25c04eae69ec3056a800aa99caf6
|
|
| MD5 |
c3c7689d255acbbfed2c6d41f3c8a2eb
|
|
| BLAKE2b-256 |
63c6315d8a3db1a9aaaa665eba6e170adf273c5996ad08e8b44cdb01e285707d
|
File details
Details for the file pylexfluent-0.0.21-py3-none-any.whl.
File metadata
- Download URL: pylexfluent-0.0.21-py3-none-any.whl
- Upload date:
- Size: 119.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb0b9258443a49dc310b108fe4d5ca3687e22d6228ff288e605d9047b9b7a0f2
|
|
| MD5 |
70c22f555bb2db1f8a113f1d411b8475
|
|
| BLAKE2b-256 |
adf902c33c81eea17b5329d79948436eb08a085bea2676dd7e08fd45b8e83b91
|