Skip to main content

For scraping data from DPF

Project description

PDFMaster Package

This is a PDFTableMaster package. You can use Github-flavored Markdown

Parmeters to adjust #pdfTable.set_parameters({'upperBoundry':10, 'lowerBoundry':10 , 'margin':3})

-->upperBoundry and lowerBoundry states the upper and lower boundries in the vertical axis to identify rows -->These values should be modified to fit the PDF table you're about the scrape -->Margin defines the horizontal bountries of the table ( use to identify columns)

Project will provide you with a unstrucured table structure (Lists inside a list) -->User shoud implement the CleanMaster Class that comes with the package to define how the cleaning should be done -->Refer the example.py to get a clear understanding on how you ca use this class -->cleanListMaster() comes under CleanMaster class will define this functionality

class clean(CleanMaster): def cleanListMaster(self , rows): #you have to implement this method with rules to filter out rows finalPageList = [] for row in rows: if(len(row) >= 6 and len(row) <= 6): if(row[0].strip().startswith("LKA") and len(row[0].strip()) == 12 ): finalPageList.append(clean.removeComma(row) )

        return finalPageList

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PDFMaster-0.0.1.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

PDFMaster-0.0.1-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file PDFMaster-0.0.1.tar.gz.

File metadata

  • Download URL: PDFMaster-0.0.1.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.4

File hashes

Hashes for PDFMaster-0.0.1.tar.gz
Algorithm Hash digest
SHA256 9a1076e565d4dde067f365c41c2b1e6a258fdb9eda71720d6c75e6ab403591c5
MD5 3bb18a689fd37cb73db60853cda7923b
BLAKE2b-256 aa21d2393df7b7efc026dea778bd5508727f53ee50e14bfdc29e183ee6c491a9

See more details on using hashes here.

File details

Details for the file PDFMaster-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: PDFMaster-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.4

File hashes

Hashes for PDFMaster-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0ac030eefdf202dc802e1befdfbd8653d2ceacdb4eb7f0feadbe65482131ce3b
MD5 85c1e1a96a6e7b4c9ab6392d750aeae8
BLAKE2b-256 68608123e0b64b57920c247f52e5436bfe19f1e8f4444aaf0b53b19ccb65ccbb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page