Extracts data from PDF files and saves it to Excel files.

Project description

PyPI Downloads

📄 pdfsp

pdfsp is a Python package that extracts tables from PDF files and saves them to Excel. It also provides a simple Streamlit app for interactive viewing of the extracted data.

🚀 Features

Extracts tabular data from PDFs using pdfplumber
Converts tables into pandas DataFrames
Saves output as .xlsx Excel files using openpyxl
Ensures column names are unique to prevent issues
Visualizes DataFrames with streamlit

📦 Installation

Make sure you're using Python 3.10 or newer, then install with:

pip install pdfsp -U

python script

# pdf.py
from pdfsp import extract_tables, Options

# Define extraction options
source_folder = "."
output_folder = "output"
combine_tables = True

options = Options(
    source_folder=source_folder,
    output_folder=output_folder,
    combine=combine_tables
)

# Run the table extraction
extract_tables(options)

From console / Terminal / Command Line

# Extract all tables from all PDF files in the current folder and save them to the current folder
pdfsp . .

# Extract and COMBINE large tables (spanning multiple pages) into single files, saved to the current folder
pdfsp . . --combine

# Extract and COMBINE tables, skipping the first row of each table (e.g., header rows)
pdfsp . . --combine --skiprows=1

# Extract all tables from PDF files in 'someFolder' and save them to 'SomeOutFolder'
pdfsp someFolder SomeOutFolder

# Extract all tables from 'some.pdf' and save them to the current folder
pdfsp some.pdf .

# Extract all tables from 'some.pdf' and save them to 'toThisFolder'
pdfsp some.pdf toThisFolder

=== 📊 Extraction Summary Report ===
✅ Successful Files: 3
   - pdfs/report1.pdf → 🗂️ 5 tables extracted
   - pdfs/summary2.pdf → 🗂️ 3 tables extracted
   - pdfs/report2.pdf → 🗂️ 7 tables extracted

❌ Failed Files: 1
   - pdfs/corrupted.pdf

⚠️ Some files failed to process. See details above.

Project details

Release history Release notifications | RSS feed

This version

0.1.14

May 12, 2025

0.1.13

May 12, 2025

0.1.12

May 12, 2025

0.1.11

May 11, 2025

0.1.10

May 11, 2025

0.1.9

May 11, 2025

0.1.8

May 11, 2025

0.1.7

May 11, 2025

0.1.6

May 11, 2025

0.1.5

May 9, 2025

0.1.4

Apr 16, 2025

0.1.3

Apr 11, 2025

0.1.2

Apr 11, 2025

0.1.1

Apr 10, 2025

0.1.0

Apr 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfsp-0.1.14.tar.gz (7.4 kB view details)

Uploaded May 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pdfsp-0.1.14-py3-none-any.whl (12.6 kB view details)

Uploaded May 12, 2025 Python 3

File details

Details for the file pdfsp-0.1.14.tar.gz.

File metadata

Download URL: pdfsp-0.1.14.tar.gz
Upload date: May 12, 2025
Size: 7.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.3

File hashes

Hashes for pdfsp-0.1.14.tar.gz
Algorithm	Hash digest
SHA256	`e1e32164c0b3661e7401f6d78f3f843b45b182a907a5ab17995abbccc2b32efa`
MD5	`864041a110453bcf7968c4d85cb9e502`
BLAKE2b-256	`127d0b1a996853e92708dd36feca053d44c11eb3a95a825fdea3a44f5bc1513e`

See more details on using hashes here.

File details

Details for the file pdfsp-0.1.14-py3-none-any.whl.

File metadata

Download URL: pdfsp-0.1.14-py3-none-any.whl
Upload date: May 12, 2025
Size: 12.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.3

File hashes

Hashes for pdfsp-0.1.14-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6b14c38ba1b7d2c871d15f3f64084bef391362b1173f4453a158bb167887f6cd`
MD5	`1d13a28b9c5e4f08a827a2a925f41319`
BLAKE2b-256	`e2840a747e86b36f1604ba8d14db58113a2824bcb6d289aa82222f32d048f63e`

See more details on using hashes here.

pdfsp 0.1.14

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

📄 pdfsp

🚀 Features

📦 Installation

python script

From console / Terminal / Command Line

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes