Convert PDFs into pandas DataFrames, remove restrictions, put/crack PDF passwords
Project description
Convert PDFs into pandas DataFrames, remove restrictions, put/crack PDF passwords
pip install pdferli
Tested against Windows 10 / Python 3.10 / Anaconda
crack_password(file, chars, processes=4, minlen=None, maxlen=None, verbose=True)
Attempt to crack a PDF password using a brute-force approach.
Args:
file (str): Path to the encrypted PDF file.
chars (iterable): List of characters to generate passwords from.
processes (int, optional): Number of parallel processes for password cracking. Defaults to 4.
minlen (int, optional): Minimum length of generated passwords. Defaults to 1.
maxlen (int, optional): Maximum length of generated passwords. Defaults to length of chars + 1.
verbose (bool, optional): Whether to display progress information. Defaults to True.
Returns:
str: Cracked password if successful, None if not successful
get_pdfdf(path, normalize_content=False, **kwargs)
Extract structured data from a PDF document and return it as a pandas DataFrame.
Args:
path (str): Path to the PDF file.
normalize_content (bool, optional): Whether to normalize content extraction. Defaults to False.
**kwargs: Additional keyword arguments for pikepdf.open and extract_pages methods.
Returns:
pandas.DataFrame: DataFrame containing extracted structured data from the PDF.
put_password_encryption(inputfile, outputfile, password)
Encrypt a PDF file using a specified password.
Args:
inputfile (str): Path to the input PDF file.
outputfile (str): Path to the output encrypted PDF file.
password (str): Password for encryption.
remove_restrictions(inputfile, outputfile, **kwargs)
Remove encryption and restrictions from a PDF file.
Args:
inputfile (str): Path to the input encrypted PDF file.
outputfile (str): Path to the output decrypted PDF file.
**kwargs: Additional keyword arguments for pikepdf.save method.
Examples:
from time import perf_counter
from pdferli import (
crack_password,
put_password_encryption,
remove_restrictions,
get_pdfdf,
)
put_password_encryption(
r"C:\sample.pdf",
r"C:\sample4.pdf",
password="1234",
)
path = r"C:\Arquivo.pdf"
remove_restrictions(path, "c:\\norestrictions.pdf")
df = get_pdfdf(path, normalize_content=False)
if __name__ == "__main__": # necessary for crack_password since it uses multiprocessing
start = perf_counter()
x = crack_password(
file=r"C:\sample4.pdf",
chars=list("0123456789"),
processes=4,
minlen=0,
maxlen=None,
verbose=True,
)
print(perf_counter() - start)
print(x)
start = perf_counter()
# output df
aa_adv aa_bits aa_colorspace aa_element_index aa_element_type aa_evenodd aa_fill aa_fontname aa_height aa_imagemask aa_linewidth aa_name aa_size aa_srcsize aa_stream aa_stroke aa_text aa_text_element aa_text_line aa_upright aa_width aa_x0 aa_x1 aa_y0 aa_y1 bb_hierachy_element bb_hierachy_page
0 31.968 <NA> <NA> 0 LTChar <NA> <NA> ArialMT 56.546172 <NA> <NA> <NA> 56.546172 <NA> <NA> <NA> A APENAS VISUALIZAÇÃO A True 11.336388 126.431281 137.767669 242.012331 298.558504 (0, 0, 0) (0, 0)
1 <NA> <NA> <NA> 1 LTAnno <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> \n <NA> False <NA> <NA> <NA> <NA> <NA> (0, 0, 0) (0, 0)
2 31.968 <NA> <NA> 2 LTChar <NA> <NA> ArialMT 56.546172 <NA> <NA> <NA> 56.546172 <NA> <NA> <NA> P APENAS VISUALIZAÇÃO P True 11.336388 149.036174 160.372561 264.617224 321.163396 (0, 0, 0) (0, 0)
3 <NA> <NA> <NA> 3 LTAnno <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> \n <NA> False <NA> <NA> <NA> <NA> <NA> (0, 0, 0) (0, 0)
4 31.968 <NA> <NA> 4 LTChar <NA> <NA> ArialMT 56.546172 <NA> <NA> <NA> 56.546172 <NA> <NA> <NA> E APENAS VISUALIZAÇÃO E True 11.336388 171.641066 182.977454 287.222116 343.768289 (0, 0, 0) (0, 0)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pdferli-0.11.tar.gz
(14.6 kB
view details)
Built Distribution
pdferli-0.11-py3-none-any.whl
(15.0 kB
view details)
File details
Details for the file pdferli-0.11.tar.gz
.
File metadata
- Download URL: pdferli-0.11.tar.gz
- Upload date:
- Size: 14.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 929dd3c8ed8d8c7083f448f4193b94274b49d8c252fc6578c90e8a3b55638f4e |
|
MD5 | 9fd1d1fb264eaac6d962b042fbac48a4 |
|
BLAKE2b-256 | 476519a603bfd5a7e44abd380da3b3df8f2a002f15f5417fdd996241ed82311e |
File details
Details for the file pdferli-0.11-py3-none-any.whl
.
File metadata
- Download URL: pdferli-0.11-py3-none-any.whl
- Upload date:
- Size: 15.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef12ce3e7b1d1288f7f5382e41abc20936c9f30d49f324aab6b3055e0b039bf2 |
|
MD5 | 3b9bbfd9f4c54711271f86cd4fc34680 |
|
BLAKE2b-256 | 2dc4aef642801ea4103f343a054a17ad9114e993b9bf5ef1585385690d94d02b |