Extracts data from PDF files and saves it to Excel files.
Project description
📄 pdfsp
pdfsp is a Python package that extracts tables from PDF files and saves them to Excel. It also provides a simple Streamlit app for interactive viewing of the extracted data.
🚀 Features
- Extracts tabular data from PDFs using
pdfplumber - Converts tables into
pandasDataFrames - Saves output as
.xlsxExcel files usingopenpyxl - Ensures column names are unique to prevent issues
- Visualizes DataFrames with
streamlit
📦 Installation
Make sure you're using Python 3.10 or newer, then install with:
pip install pdfsp -U
python script
# pdf.py
from pdfsp import extract_tables
source_folder = "."
output_folder = "output"
extract_tables(source_folder, output_folder )
From console / Terminal / Command Line
# all tables from all pdf files in the current folder to current folder
pdfsp . .
# all tables from all pdf files in someFolder to current SomeOutFolder
pdfsp someFolder SomeOutFolder
# all tables of some.pdf to the current folder
pdfsp some.pdf .
# all tables of some.pdf to the toThisFolder folder
pdfsp some.pdf toThisFolder
=== 📊 Extraction Summary Report ===
✅ Successful Files: 3
- data/report1.pdf → 🗂️ 5 tables extracted
- data/summary2.pdf → 🗂️ 3 tables extracted
- data/financials.pdf → 🗂️ 7 tables extracted
❌ Failed Files: 1
- data/corrupted.pdf
⚠️ Some files failed to process. See details above.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pdfsp-0.1.7.tar.gz
(112.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
pdfsp-0.1.7-py3-none-any.whl
(8.1 kB
view details)
File details
Details for the file pdfsp-0.1.7.tar.gz.
File metadata
- Download URL: pdfsp-0.1.7.tar.gz
- Upload date:
- Size: 112.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1565725fdfd8d433814c661f6327c692ab6f9b7e5091280133e351b6c244162c
|
|
| MD5 |
2185e4d24d626bdeba9775168c7ca2fc
|
|
| BLAKE2b-256 |
970f8e38a2d95296044d46ea55502b07e7143e75d20b5eb39a2e7f625bd4fa1b
|
File details
Details for the file pdfsp-0.1.7-py3-none-any.whl.
File metadata
- Download URL: pdfsp-0.1.7-py3-none-any.whl
- Upload date:
- Size: 8.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f09335c108dff743e091ddbadefc54b1b32165d3e5e597f71944a63cd84bbcf
|
|
| MD5 |
b5f6a99a3d7cafe54836a5a33fb7c1bb
|
|
| BLAKE2b-256 |
75bb57f57a8ec80db667b2815c18ba63640427839d576810b90c9311b0cec556
|