Skip to main content

A package to generate comprehensive insights from documents using NLP techniques.

Project description

Document Insights Generator The Document Insights Generator is a Python package that uses natural language processing (NLP) techniques to extract valuable insights from text documents. The tool supports PDF and Word (.docx) documents.

Features Text extraction from PDF and DOCX documents. Keyword extraction using TF-IDF. Named Entity Recognition (NER) using dslim/bert-base-NER transformer model. Topic modeling using Latent Dirichlet Allocation (LDA). Answers questions about the document content using GPT-2 model from the OpenAI API. Provides references based on the document’s content. Installation You can install the Document Insights Generator from PyPI:

bash Copy code pip install documentinsightsgenerator This will also install the required dependencies.

Usage Here is a basic example of using the Document Insights Generator:

python Copy code from documentinsightsgenerator import DocumentInsightsGenerator

# Initialize the DocumentInsightsGenerator with the API key dig = DocumentInsightsGenerator(api_key=”your-openai-api-key”)

# Load a document dig.load_document(“path/to/your/document.pdf”)

# Ask a question about the document answer = dig.answer_question(“What is the main topic of the document?”) print(f”Answer: {answer}n”) For more detailed examples, please refer to the examples directory.

Contributing We welcome contributions! Please see our contributing guidelines for more details.

License This project is licensed under the terms of the MIT license. See LICENSE for more information.

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

DocumentInsightsGenerator-0.1.tar.gz (7.2 kB view hashes)

Uploaded Source

Built Distribution

DocumentInsightsGenerator-0.1-py3-none-any.whl (8.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page