Extract data from pdf to a dataframe
Project description
Pdf2Df
This is a simple python package to create a dataframe with the text extracted from PDFs.
To install:
$ pip install pdf2df
$ pip install PyMuPDF==1.16.14
Get Started
To use the package, first import it:
from pdf2df import Pdf2df
sfd = Pdf2df(path, page=True, single_file=False)
df = sfd.get_text()
Arguments
- path (str) : Where the files are located. It could be a single file or a folder containing multiple pdf files
- page (bool) : If True, the dataframe will contain each page of the pdf in a new row, if flase, all the text in the pdf will be in the same row.
- single_file (bool) : This tell is method if the path is a folder containing multiple pdf files or a single pdf file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pdf2df-0.0.3.tar.gz
(2.9 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf2df-0.0.3.tar.gz.
File metadata
- Download URL: pdf2df-0.0.3.tar.gz
- Upload date:
- Size: 2.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55c6297c00220f3dace131434d88350d0246d7e3bcc4689d6f1d39ad31171eac
|
|
| MD5 |
7c44f9cb0850f74966327b97ff52861b
|
|
| BLAKE2b-256 |
210de1af2566ff823a6df577fcfc9cba738b6e485a76a624d818f6016bf0aabd
|
File details
Details for the file pdf2df-0.0.3-py3-none-any.whl.
File metadata
- Download URL: pdf2df-0.0.3-py3-none-any.whl
- Upload date:
- Size: 3.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b29fd5956ed20c183a0d499b14743fe17ace5c8027f4713258b4d8bd7e898398
|
|
| MD5 |
8e3b83bb4788a5a98a7de2973cbe90b2
|
|
| BLAKE2b-256 |
996d1504d23452f48209965909ca68f2ad3a50d24f0dea1be91e86887a28a33a
|