Skip to main content

Converter from urls,pdfs,wikipages to clean text document one sentence per line.

Project description

sentify is a simple and fast tool that converts documents to clean one sentence per line text files ready for NLP tools and Generative AI processing

It currently handles local and remote txt and pdf files as well as Wikipedia wikipage given by their title.

See code in sentify.main function sentify for the simple, all in one API.

See tests/tests.py for testing out the API on several use cases.

Enjoy,

Paul Tarau

January, 2024

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sentify-0.3.4.tar.gz (5.6 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page