Skip to main content

Scrapy downloader middleware that stores response HTML files to disk.

Project description

This is Scrapy downloader middleware that stores response HTMLs to disk.


Turn downloader on, e.g. specifying it in

    'scrapy_html_storage.HtmlStorageMiddleware': 10,

None of responses by default are saved to disk. You must select for which requests the response HTMLs will be saved:

def parse(self, response):
     """Processes start urls.

         response (HtmlResponse): scrapy HTML response object.
     yield scrapy.Request(
           'save_html': True,

The file path where HTML will be stored is resolved with spider method response_html_path. E.g.:

class TargetSpider(scrapy.Spider):
    def response_html_path(self, request):
            request (scrapy.http.request.Request): request that produced the
        return 'html/last_response.html'


HTML storage downloader middleware supports such options:

  • gzip_output (bool) - if True, HTML output will be stored in gzip format. Default is False.
  • save_html_on_status (list) - if not empty, sets list of response codes whitelisted for html saving. If list is empty or not provided, all response codes will be allowed for html saving.


    'gzip_output': True,
    'save_html_on_status': [200, 202]

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for scrapy-html-storage, version 0.4.0
Filename, size File type Python version Upload date Hashes
Filename, size scrapy-html-storage-0.4.0.tar.gz (3.0 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page