A flexible text summarization library supporting extractive and abstractive methods.
Project description
pysuma
🚀 PySummarizer - The Ultimate Text Summarization Library
PySummarizer is a powerful and intelligent text summarization library designed for both extractive and abstractive summarization. It efficiently extracts key insights from PDFs and presents them in a structured, bullet-point format, making it perfect for academic research, content summarization, and AI-driven automation.
🌟 Features
✅ Supports Extractive & Abstractive Summarization
✅ Seamlessly extracts text from PDFs & generates structured summaries
✅ Utilizes TextRank, LSA, and LexRank for extractive summarization
✅ Automatically adjusts bullet points based on text length
✅ Customizable summary length to meet your needs
✅ Command-line interface (CLI) for quick and efficient summarization
✅ Easy-to-use and integrates effortlessly into Python projects
📌 Installation
Get started in seconds! Install PySummarizer via pip:
pip install pysuma
🚀 Quick Start Guide
Transform long PDFs into structured summaries in just a few lines of code.
1️⃣ Import the Library
import pysuma as pyss
2️⃣ Define PDF Paths
pdf_path = "sample.pdf" # Input PDF file
output_file = "summary.txt" # Output summary file
3️⃣ Generate a Summarized Report
pyss.summarize_pdf(
pdf_path,
output_file,
method="textrank", # Choose from "textrank", "lsa", "lexrank"
summary_type="extractive" # Select either "extractive" or "abstractive"
)
4️⃣ Run & Retrieve Your Summary
python script.py
✅ The summary will be saved in summary.txt, formatted as bullet points.
📄 Summary Output Example
PySummarizer automatically structures summaries into bullet points for better readability:
• Learning enables acquiring new skills and knowledge.
• Supervised learning requires labeled datasets for training.
• Decision trees classify data using entropy and information gain.
• Reinforcement learning optimizes decision-making using rewards and penalties.
...
(Total bullet points depend on text length)
📊 Adaptive Bullet Point Summarization
PySummarizer dynamically adjusts the number of bullet points based on text length:
| Text Length (Characters) | Number of Bullet Points |
|---|---|
| 0 - 2500 | 10 |
| 2501 - 5000 | 20 |
| 5001 - 7500 | 30 |
| More than 7500 | 50 |
🔹 Ensures precise summarization without losing key details.
💻 CLI Usage (Command Line)
Use PySummarizer directly from the terminal for quick PDF summarization:
pysuma sample.pdf summary.txt --method textrank --summary_type extractive
✔️ Perfect for automation & large-scale text processing.
📜 License
PySummarizer is released under the MIT License, making it open-source and free for personal and commercial use.
🎯 Transform lengthy PDFs into structured insights with PySummarizer today!
🔗 Contribute or explore more: GitHub Repository
---
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pysuma-0.1.0.tar.gz.
File metadata
- Download URL: pysuma-0.1.0.tar.gz
- Upload date:
- Size: 6.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b76453f76173937553c24cb9bb3614864458af96a45b0983439305849a18f1e
|
|
| MD5 |
40222be4649605e8a08293518ecf70e1
|
|
| BLAKE2b-256 |
6c17d95d260dc73359af8b46d97bad806c5106f3b805e22bc0f7dc573e7f4db0
|
File details
Details for the file pysuma-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pysuma-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d648f828124909aaa936ff51342f8942e77746f764adafaff70f6f3c3b2542e
|
|
| MD5 |
822ec9f56cebf83d39837ee950f50118
|
|
| BLAKE2b-256 |
dd6a375af3888d68810e80dae8a73c23cf9b5fe2df44a2d8bab843a471575762
|