Skip to main content

Extracteur de données de documents

Project description

Libraire python Lexfluent RevolutionAI

Auteur Jacques MASSA Créé le 2 décembre 2024


Présentation

Cette librairie permet:

  • la classification de documents selon le modèle jupiterB0
  • l'extraction de données contenu dans des documents de classes connues(Offre de prêts, IBAN, CNI, etc ...).

Installations Prérequises


    pip install setuptools wheel 
    pip install pdfplumber 
    pip install spacy[cuda12x]
    pip install tqdm 
    pip install opencv-python
    pip install pytesseract
    pip install pdf2image
    pip install pillow==10.0.1
    pip install pandas
    pip install scikit-learn
    pip install matplotlib
    pip install tensorflow==2.17.0
    pip install tf-keras==2.17.0
    pip install tensorflow_hub
    pip install tensorrt
    pip install langchain-community
    pip install ocrmypdf

Téléchargement modèles

SPACY

python -m spacy download fr_core_news_lg

Update et installations requises

    apt-get update 
    apt-get upgrade
    apt install software-properties-common -y
    apt-get install poppler-utils -y
    add-apt-repository ppa:alex-p/tesseract-ocr5
    apt-get install libc6 -y
    apt-get install poppler-utils -y
    apt-get install tesseract-ocr -y
    apt-get install tesseract-ocr-fra -y
    apt-get install tesseract-ocr-eng -y
    apt-get install tesseract-ocr-ita -y
    apt-get install tesseract-ocr-spa -y
    apt-get install tesseract-ocr-deu -y
    apt-get install tesseract-ocr-cos -y
    apt-get install tesseract-ocr-lat -y
    apt-get install automake libtool -y
    apt-get install libleptonica-dev -y
    apt-get install ffmpeg libsm6 libxext6  -y
    apt-get install ocrmypdf -y    

GPU issue

Si problème : Successful NUMA node read from SysFS had negative value (-1)

for a in /sys/bus/pci/devices/*; do echo 0 |  tee -a $a/numa_node; done

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pylexfluent-0.0.22.tar.gz (25.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pylexfluent-0.0.22-py3-none-any.whl (119.2 kB view details)

Uploaded Python 3

File details

Details for the file pylexfluent-0.0.22.tar.gz.

File metadata

  • Download URL: pylexfluent-0.0.22.tar.gz
  • Upload date:
  • Size: 25.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.3

File hashes

Hashes for pylexfluent-0.0.22.tar.gz
Algorithm Hash digest
SHA256 5f5f306de9e0aaeffe3e41e9b08b21fddb5390fee8f8cb9edc5b4a493b1495ce
MD5 57620f0185670057d4f7ca9c7abe3f83
BLAKE2b-256 011376d2aa5d0b920fb942ccbcadab6aec7b6e6394f45d4e8cd64f07c70e5645

See more details on using hashes here.

File details

Details for the file pylexfluent-0.0.22-py3-none-any.whl.

File metadata

  • Download URL: pylexfluent-0.0.22-py3-none-any.whl
  • Upload date:
  • Size: 119.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.3

File hashes

Hashes for pylexfluent-0.0.22-py3-none-any.whl
Algorithm Hash digest
SHA256 e84215ede98af1391d4c24ee2beb94ec4807662c695098b6572813a5820c367b
MD5 90c802426e54b0979404eeafd7baa1d9
BLAKE2b-256 cdabe57dfdb8e13134186bb0b148a925139325df18fadc270fb56952f3d90a27

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page