Skip to main content

Extracteur de données de documents

Project description

Libraire python Lexfluent RevolutionAI

Auteur Jacques MASSA Créé le 2 décembre 2024


Présentation

Cette librairie permet:

  • la classification de documents selon le modèle jupiterB0
  • l'extraction de données contenu dans des documents de classes connues(Offre de prêts, IBAN, CNI, etc ...).

Installations Prérequises


    pip install setuptools wheel 
    pip install pdfplumber 
    pip install spacy[cuda12x]
    pip install tqdm 
    pip install opencv-python
    pip install pytesseract
    pip install pdf2image
    pip install pillow==10.0.1
    pip install pandas
    pip install scikit-learn
    pip install matplotlib
    pip install tensorflow==2.17.0
    pip install tf-keras==2.17.0
    pip install tensorflow_hub
    pip install tensorrt
    pip install langchain-community
    pip install ocrmypdf

Téléchargement modèles

SPACY

python -m spacy download fr_core_news_lg

Update et installations requises

    apt-get update 
    apt-get upgrade
    apt install software-properties-common -y
    apt-get install poppler-utils -y
    add-apt-repository ppa:alex-p/tesseract-ocr5
    apt-get install libc6 -y
    apt-get install poppler-utils -y
    apt-get install tesseract-ocr -y
    apt-get install tesseract-ocr-fra -y
    apt-get install tesseract-ocr-eng -y
    apt-get install tesseract-ocr-ita -y
    apt-get install tesseract-ocr-spa -y
    apt-get install tesseract-ocr-deu -y
    apt-get install tesseract-ocr-cos -y
    apt-get install tesseract-ocr-lat -y
    apt-get install automake libtool -y
    apt-get install libleptonica-dev -y
    apt-get install ffmpeg libsm6 libxext6  -y
    apt-get install ocrmypdf -y    

GPU issue

Si problème : Successful NUMA node read from SysFS had negative value (-1)

for a in /sys/bus/pci/devices/*; do echo 0 |  tee -a $a/numa_node; done

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pylexfluent-0.0.21.tar.gz (25.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pylexfluent-0.0.21-py3-none-any.whl (119.2 kB view details)

Uploaded Python 3

File details

Details for the file pylexfluent-0.0.21.tar.gz.

File metadata

  • Download URL: pylexfluent-0.0.21.tar.gz
  • Upload date:
  • Size: 25.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.3

File hashes

Hashes for pylexfluent-0.0.21.tar.gz
Algorithm Hash digest
SHA256 9c15493b106c7fa19e4445b1e340f12b45cc25c04eae69ec3056a800aa99caf6
MD5 c3c7689d255acbbfed2c6d41f3c8a2eb
BLAKE2b-256 63c6315d8a3db1a9aaaa665eba6e170adf273c5996ad08e8b44cdb01e285707d

See more details on using hashes here.

File details

Details for the file pylexfluent-0.0.21-py3-none-any.whl.

File metadata

  • Download URL: pylexfluent-0.0.21-py3-none-any.whl
  • Upload date:
  • Size: 119.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.3

File hashes

Hashes for pylexfluent-0.0.21-py3-none-any.whl
Algorithm Hash digest
SHA256 cb0b9258443a49dc310b108fe4d5ca3687e22d6228ff288e605d9047b9b7a0f2
MD5 70c22f555bb2db1f8a113f1d411b8475
BLAKE2b-256 adf902c33c81eea17b5329d79948436eb08a085bea2676dd7e08fd45b8e83b91

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page