Skip to main content

A Python library for extracting text content from any document format.

Project description

Any document Extractor

A Python library for extracting text content from any document format.

Features

  • Supports multiple document formats (PPTX, DOCX, PDF, XLSX.)
  • Returns clean extracted text

Installation

pip install any-document-extractor

Usage

Basic usage example:

from anydocumentextractor import DocumentExtractor


def main(fp: str):
    extra = DocumentExtractor(fp)
    return extra.extract()


if __name__ == '__main__':
    fp = 'text.docx'  # Can be any supported document
    content = main(fp)
    print(content)

Supported Formats

  • Microsoft Office: PPTX, DOCX, XLSX
  • OpenDocument: ODT, ODP
  • PDF documents
  • Plain text files
  • And more...

License

MIT License - Free for commercial and personal use.

You can customize this further by adding:

  • More detailed installation instructions
  • Specific version requirements
  • Advanced usage examples
  • Error handling documentation
  • Contribution guidelines
  • Project status badges

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

any_document_extractor-0.1.1.tar.gz (2.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

any_document_extractor-0.1.1-py3-none-any.whl (2.7 kB view details)

Uploaded Python 3

File details

Details for the file any_document_extractor-0.1.1.tar.gz.

File metadata

  • Download URL: any_document_extractor-0.1.1.tar.gz
  • Upload date:
  • Size: 2.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for any_document_extractor-0.1.1.tar.gz
Algorithm Hash digest
SHA256 942b59dea70248bd56e9e3a06cac613825b823ecab84c46bba8a4fbb0d800a4b
MD5 32d225c0577c9a961f9ce3e49776910f
BLAKE2b-256 c78cd7f3b829dcf448f51870f7a96d89958d2ae1fca0ab5524c76f1091cb2c82

See more details on using hashes here.

File details

Details for the file any_document_extractor-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for any_document_extractor-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1d65a34819a2efe68e83aef42035b22ae0e04bd3cbac704fc8dc6cf40bf21b93
MD5 f5d709a7aab1ebee6aad0a1672422b51
BLAKE2b-256 95bed3228c3fa67276dc8b4cbca22e276fcdc1a638f37760dd44947fb658135a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page