Ruppell is a Python package to help in text extraction from documents.
Project description
Ruppell: powerful Python text extractor toolkit
What is it?
Ruppell is a Python package to help in documents' text extraction.
Main Features
Here are just a few of the things that ruppell does well:
- Create datasets from multiple files.
- Extract documents' text (pdf, docx, jpeg, jpg, png).
- Create Pandas dataframe from documents' folder.
- Convert documents to .txt files
Where to get it
Binary installers for the latest released version are available at the Python package index.
pip install ruppell
Dependencies
- Pillow
- Pytesseract
- Pdfminer.six
- Docx2txt
- Pandas
- Python >= 3.6
Example
>>> import ruppell
>>> ruppell.image_to_string('image.png')
'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer id bibendum sapien.'
Supported Languages
The language codes are ISO 639-2/B or ISO 639-2/T.
All languages codes here.
Contributing
If you think that we can do the Ruppell more powerful please contribute with this project. And let's improve it to help other developers.
Create a pull request or let's talk about something in issues. Thanks a lot.
Author
Jorge Melgarejo, melgarejo.colarte@gmail.com
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ruppell-1.0.0.tar.gz
(4.7 kB
view hashes)