For scraping data from DPF
Project description
PDFMaster Package
This is a PDFTableMaster package. You can use Github-flavored Markdown
Parmeters to adjust #pdfTable.set_parameters({'upperBoundry':10, 'lowerBoundry':10 , 'margin':3})
-->upperBoundry and lowerBoundry states the upper and lower boundries in the vertical axis to identify rows -->These values should be modified to fit the PDF table you're about the scrape -->Margin defines the horizontal bountries of the table ( use to identify columns)
Project will provide you with a unstrucured table structure (Lists inside a list) -->User shoud implement the CleanMaster Class that comes with the package to define how the cleaning should be done -->Refer the example.py to get a clear understanding on how you ca use this class -->cleanListMaster() comes under CleanMaster class will define this functionality
class clean(CleanMaster): def cleanListMaster(self , rows): #you have to implement this method with rules to filter out rows finalPageList = [] for row in rows: if(len(row) >= 6 and len(row) <= 6): if(row[0].strip().startswith("LKA") and len(row[0].strip()) == 12 ): finalPageList.append(clean.removeComma(row) )
return finalPageList
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file PDFMaster-0.0.1.tar.gz
.
File metadata
- Download URL: PDFMaster-0.0.1.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a1076e565d4dde067f365c41c2b1e6a258fdb9eda71720d6c75e6ab403591c5 |
|
MD5 | 3bb18a689fd37cb73db60853cda7923b |
|
BLAKE2b-256 | aa21d2393df7b7efc026dea778bd5508727f53ee50e14bfdc29e183ee6c491a9 |
File details
Details for the file PDFMaster-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: PDFMaster-0.0.1-py3-none-any.whl
- Upload date:
- Size: 8.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0ac030eefdf202dc802e1befdfbd8653d2ceacdb4eb7f0feadbe65482131ce3b |
|
MD5 | 85c1e1a96a6e7b4c9ab6392d750aeae8 |
|
BLAKE2b-256 | 68608123e0b64b57920c247f52e5436bfe19f1e8f4444aaf0b53b19ccb65ccbb |