Skip to main content

Convert PDFs into pandas DataFrames, remove restrictions, put/crack PDF passwords

Project description

Convert PDFs into pandas DataFrames, remove restrictions, put/crack PDF passwords

pip install pdferli

Tested against Windows 10 / Python 3.10 / Anaconda

crack_password(file, chars, processes=4, minlen=None, maxlen=None, verbose=True)
	Attempt to crack a PDF password using a brute-force approach.
	
	Args:
		file (str): Path to the encrypted PDF file.
		chars (iterable): List of characters to generate passwords from.
		processes (int, optional): Number of parallel processes for password cracking. Defaults to 4.
		minlen (int, optional): Minimum length of generated passwords. Defaults to 1.
		maxlen (int, optional): Maximum length of generated passwords. Defaults to length of chars + 1.
		verbose (bool, optional): Whether to display progress information. Defaults to True.
	
	Returns:
		str: Cracked password if successful, None if not successful


get_pdfdf(path, normalize_content=False, **kwargs)
	Extract structured data from a PDF document and return it as a pandas DataFrame.
	
	Args:
		path (str): Path to the PDF file.
		normalize_content (bool, optional): Whether to normalize content extraction. Defaults to False.
		**kwargs: Additional keyword arguments for pikepdf.open and extract_pages methods.
	
	Returns:
		pandas.DataFrame: DataFrame containing extracted structured data from the PDF.

put_password_encryption(inputfile, outputfile, password)
	Encrypt a PDF file using a specified password.
	
	Args:
		inputfile (str): Path to the input PDF file.
		outputfile (str): Path to the output encrypted PDF file.
		password (str): Password for encryption.


remove_restrictions(inputfile, outputfile, **kwargs)
	Remove encryption and restrictions from a PDF file.
	
	Args:
		inputfile (str): Path to the input encrypted PDF file.
		outputfile (str): Path to the output decrypted PDF file.
		**kwargs: Additional keyword arguments for pikepdf.save method.


Examples:

from time import perf_counter

from pdferli import (
    crack_password,
    put_password_encryption,
    remove_restrictions,
    get_pdfdf,
)


put_password_encryption(
    r"C:\sample.pdf",
    r"C:\sample4.pdf",
    password="1234",
)
path = r"C:\Arquivo.pdf"
remove_restrictions(path, "c:\\norestrictions.pdf")
df = get_pdfdf(path, normalize_content=False)




if __name__ == "__main__":  # necessary for crack_password since it uses multiprocessing
    start = perf_counter()
    x = crack_password(
        file=r"C:\sample4.pdf",
        chars=list("0123456789"),
        processes=4,
        minlen=0,
        maxlen=None,
        verbose=True,
    )
    print(perf_counter() - start)
    print(x)
    start = perf_counter()



# output df
   aa_adv  aa_bits aa_colorspace  aa_element_index aa_element_type  aa_evenodd  aa_fill aa_fontname  aa_height aa_imagemask  aa_linewidth aa_name    aa_size aa_srcsize aa_stream  aa_stroke aa_text      aa_text_element aa_text_line  aa_upright   aa_width       aa_x0       aa_x1       aa_y0       aa_y1 bb_hierachy_element bb_hierachy_page
0  31.968     <NA>          <NA>                 0          LTChar        <NA>     <NA>     ArialMT  56.546172         <NA>          <NA>    <NA>  56.546172       <NA>      <NA>       <NA>       A  APENAS VISUALIZAÇÃO            A        True  11.336388  126.431281  137.767669  242.012331  298.558504           (0, 0, 0)           (0, 0)
1    <NA>     <NA>          <NA>                 1          LTAnno        <NA>     <NA>        <NA>       <NA>         <NA>          <NA>    <NA>       <NA>       <NA>      <NA>       <NA>                           \n         <NA>       False       <NA>        <NA>        <NA>        <NA>        <NA>           (0, 0, 0)           (0, 0)
2  31.968     <NA>          <NA>                 2          LTChar        <NA>     <NA>     ArialMT  56.546172         <NA>          <NA>    <NA>  56.546172       <NA>      <NA>       <NA>       P  APENAS VISUALIZAÇÃO            P        True  11.336388  149.036174  160.372561  264.617224  321.163396           (0, 0, 0)           (0, 0)
3    <NA>     <NA>          <NA>                 3          LTAnno        <NA>     <NA>        <NA>       <NA>         <NA>          <NA>    <NA>       <NA>       <NA>      <NA>       <NA>                           \n         <NA>       False       <NA>        <NA>        <NA>        <NA>        <NA>           (0, 0, 0)           (0, 0)
4  31.968     <NA>          <NA>                 4          LTChar        <NA>     <NA>     ArialMT  56.546172         <NA>          <NA>    <NA>  56.546172       <NA>      <NA>       <NA>       E  APENAS VISUALIZAÇÃO            E        True  11.336388  171.641066  182.977454  287.222116  343.768289           (0, 0, 0)           (0, 0)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdferli-0.11.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

pdferli-0.11-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file pdferli-0.11.tar.gz.

File metadata

  • Download URL: pdferli-0.11.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for pdferli-0.11.tar.gz
Algorithm Hash digest
SHA256 929dd3c8ed8d8c7083f448f4193b94274b49d8c252fc6578c90e8a3b55638f4e
MD5 9fd1d1fb264eaac6d962b042fbac48a4
BLAKE2b-256 476519a603bfd5a7e44abd380da3b3df8f2a002f15f5417fdd996241ed82311e

See more details on using hashes here.

File details

Details for the file pdferli-0.11-py3-none-any.whl.

File metadata

  • Download URL: pdferli-0.11-py3-none-any.whl
  • Upload date:
  • Size: 15.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for pdferli-0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 ef12ce3e7b1d1288f7f5382e41abc20936c9f30d49f324aab6b3055e0b039bf2
MD5 3b9bbfd9f4c54711271f86cd4fc34680
BLAKE2b-256 2dc4aef642801ea4103f343a054a17ad9114e993b9bf5ef1585385690d94d02b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page