Download PDF or Save page as PDF
Project description
Download PDF function for scrapy
Installation
Install scrapy-save-as-pdf using pip::
$ pip install scrapy-save-as-pdf
Configuration
- Add the
settings.py
of your Scrapy project like this:
PROXY = ""
CHROME_DRIVER_PATH ='/snap/bin/chromium.chromedriver'
PDF_SAVE_PATH="./pdfs"
PDF_SAVE_AS_PDF = False
PDF_DOWNLOAD_TIMEOUT = 60
PDF_PRINT_OPTIONS = {
'landscape': False,
'displayHeaderFooter': False,
'printBackground': True,
'preferCSSPageSize': True,
}
- Enable the pipeline by adding it to
ITEM_PIPELINES
in yoursettings.py
file and changing HttpCompressionMiddleware priority:
ITEM_PIPELINES = {
'scrapy_save_as_pdf.pipelines.SaveAsPdfPipeline': -1,
}
The order should before your persist pipeline such as save to database and after your preprocess pipeline.
Usage
set the pdf_url
and/or url
field in your yielded item
import scrapy
class MySpider(scrapy.Spider):
start_urls = [
"http://example.com",
]
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(url, self.parse)
def parse(self, response):
yield {
"url": "http://example.com/cate1/page1.html",
"pdf_url": "http://example.com/cate1/page1.pdf",
}
yield {
"url": "http://example.com/cate1/page2.html",
"pdf_url": "http://example.com/cate1/page2.pdf",
}
the pdf_url
field will be populated with the downloaded pdf file location, if pdf_url
field has old value then move it to origin_pdf_url
field, you can handle them in your next pipeline.
Getting help
Please use github issue
Contributing
PRs are always welcomed.
Changes
0.1.0 (2020-12-25)
Initial release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for scrapy_save_as_pdf-0.1.3-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e701c6f1a63312b58a2fe24b9e32ddc4d79a8acb35ee7ba753bee19a0950d9d1 |
|
MD5 | 28239ff2cf763315aa139b6bb4293bdd |
|
BLAKE2b-256 | 786abda67bd2c85a09732fadc267363a10450b2a7a0f7f50b8aea0e023371711 |