Skip to main content

Ruppell is a Python package to help in text extraction from documents.

Project description

Ruppell: powerful Python text extractor toolkit

What is it?

Ruppell is a Python package to help in documents' text extraction.

Main Features

Here are just a few of the things that ruppell does well:

  • Create datasets from multiple files.
  • Extract documents' text (pdf, docx, jpeg, jpg, png).
  • Create Pandas dataframe from documents' folder.
  • Convert documents to .txt files

Where to get it

Binary installers for the latest released version are available at the Python package index.

pip install ruppell

Dependencies

Example

>>> import ruppell
>>> ruppell.image_to_string('image.png')
'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer id bibendum sapien.'

Supported Languages

The language codes are ISO 639-2/B or ISO 639-2/T.

All languages codes here.

Contributing

If you think that we can do the Ruppell more powerful please contribute with this project. And let's improve it to help other developers.

Create a pull request or let's talk about something in issues. Thanks a lot.

Author

Jorge Melgarejo, melgarejo.colarte@gmail.com

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ruppell-1.0.0.tar.gz (4.7 kB view hashes)

Uploaded Source

Built Distribution

ruppell-1.0.0-py3-none-any.whl (5.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page