Extract data from pdf to a dataframe
Project description
Pdf2Df
This is a simple python package to create a dataframe with the text extracted from PDFs.
To install:
$ pip install pdf2df
Get Started
To use the package, first import it:
from pdf2df import Pdf2df
sfd = Pdf2df(path, page=True, single_file=False)
df = sfd.get_text()
Arguments
- path (str) : Where the files are located. It could be a single file or a folder containing multiple pdf files
- page (bool) : If True, the dataframe will contain each page of the pdf in a new row, if flase, all the text in the pdf will be in the same row.
- single_file (bool) : This tell is method if the path is a folder containing multiple pdf files or a single pdf file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pdf2df-0.0.2.tar.gz
(2.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf2df-0.0.2.tar.gz.
File metadata
- Download URL: pdf2df-0.0.2.tar.gz
- Upload date:
- Size: 2.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
acfeb70e2536f46572e6459c719fc324d0252bf6f41b69fa9d28f7c3f1bd63f9
|
|
| MD5 |
bde72e8d666e17af3f0786746fd6e12b
|
|
| BLAKE2b-256 |
8b1aa98e150441ec649fe682d0a00d828154e5b4ee085c264b4534207200d270
|
File details
Details for the file pdf2df-0.0.2-py3-none-any.whl.
File metadata
- Download URL: pdf2df-0.0.2-py3-none-any.whl
- Upload date:
- Size: 3.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
706724b5ade29c50aee78e6afd159b8c4bc722a31d6e5aa45ebd4d1d7a349d2f
|
|
| MD5 |
e60396fe70956058c00a4489f58ba352
|
|
| BLAKE2b-256 |
6544c186ff4cfa109ac8bd2747d17af3a5dffecfa8d81463a6ad0c8dd595d27c
|