Extracts data from PDF files and saves it to Excel files.
Project description
📄 pdfsp
pdfsp is a Python package that extracts tables from PDF files and saves them to Excel. It also provides a simple Streamlit app for interactive viewing of the extracted data.
🚀 Features
- Extracts tabular data from PDFs using
pdfplumber - Converts tables into
pandasDataFrames - Saves output as
.xlsxExcel files usingopenpyxl - Ensures column names are unique to prevent issues
- Visualizes DataFrames with
streamlit
📦 Installation
Make sure you're using Python 3.10 or newer, then install with:
pip install pdfsp
from pdfsp import extract_tables
source_folder = "."
output_folder = "output"
extract_tables(source_folder, output_folder )
From console
pdfsp . .
pdfsp someFolder SomeOutFolder
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pdfsp-0.1.1.tar.gz
(63.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
pdfsp-0.1.1-py3-none-any.whl
(4.4 kB
view details)
File details
Details for the file pdfsp-0.1.1.tar.gz.
File metadata
- Download URL: pdfsp-0.1.1.tar.gz
- Upload date:
- Size: 63.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dfd1fdda6c57f21d3ac97c3e01f77f5dbb8fad38586e7fb4c511e79cee050147
|
|
| MD5 |
5e2f7d5a6e5b9accd58c500a2b547fd3
|
|
| BLAKE2b-256 |
ddd36feb8b24826ac4ad918231584a02c409ecd8bb2214ad52f1fa547c21f1db
|
File details
Details for the file pdfsp-0.1.1-py3-none-any.whl.
File metadata
- Download URL: pdfsp-0.1.1-py3-none-any.whl
- Upload date:
- Size: 4.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35ea312bdd66d7e8aca0753e08e1bf57387ad25e7a93aaa5dd6cd16f47003eb3
|
|
| MD5 |
71ca02aa29f039eacd8f594b46f3328b
|
|
| BLAKE2b-256 |
89b4f827c1b550b8e3feac4575c616e9adcc75e2f52c4059380c979f97a4a99f
|