A package to generate comprehensive insights from documents using NLP techniques.
Project description
Document Insights Generator The Document Insights Generator is a Python package that uses natural language processing (NLP) techniques to extract valuable insights from text documents. The tool supports PDF and Word (.docx) documents.
Features Text extraction from PDF and DOCX documents. Keyword extraction using TF-IDF. Named Entity Recognition (NER) using dslim/bert-base-NER transformer model. Topic modeling using Latent Dirichlet Allocation (LDA). Answers questions about the document content using GPT-2 model from the OpenAI API. Provides references based on the document’s content. Installation You can install the Document Insights Generator from PyPI:
bash Copy code pip install documentinsightsgenerator This will also install the required dependencies.
Usage Here is a basic example of using the Document Insights Generator:
python Copy code from documentinsightsgenerator import DocumentInsightsGenerator
# Initialize the DocumentInsightsGenerator with the API key dig = DocumentInsightsGenerator(api_key=”your-openai-api-key”)
# Load a document dig.load_document(“path/to/your/document.pdf”)
# Ask a question about the document answer = dig.answer_question(“What is the main topic of the document?”) print(f”Answer: {answer}n”) For more detailed examples, please refer to the examples directory.
Contributing We welcome contributions! Please see our contributing guidelines for more details.
License This project is licensed under the terms of the MIT license. See LICENSE for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file DocumentInsightsGenerator-0.1.tar.gz
.
File metadata
- Download URL: DocumentInsightsGenerator-0.1.tar.gz
- Upload date:
- Size: 7.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e8445b155f3ebc8278620459857fce0def28930bf12426f5928a4e581d57bcbb |
|
MD5 | 7606cbe92f94c62d1b461af88ae2743d |
|
BLAKE2b-256 | 11bb6557c99e6b2c519eccb0dcda137a3c2978ba2740b2314a3d0134c678c940 |
File details
Details for the file DocumentInsightsGenerator-0.1-py3-none-any.whl
.
File metadata
- Download URL: DocumentInsightsGenerator-0.1-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d6fec488c423e8970fd7d5cc11b4d6233795f5eb7735e00b5bd5f4d012f7ab74 |
|
MD5 | 417b337dafdef78db54abc901fd1b1b8 |
|
BLAKE2b-256 | 2cf1229ec2b0237b6f03555b42eb4c219a6adae053c595ba1a3dc889b615b10b |