Skip to main content

Utilities for running scrapy on heroku

Project description

A package to assist with running scrapy on heroku. This is accomplished by providing a custom application configuration at scrapy_heroku.app.application that launches the scrapyd web service using the PORT environment variable and a multi-process work queue implemented on a Postgres database specified by the DATABASE_URL environment variable.

Configuration

Create a git repo that has a scrapy project at the root (scrapy.cfg should be at the top level). Edit your scrapy.cfg to include the following:

[scrapyd]
application = scrapy_heroku.app.application

[deploy]
url = http://<YOUR_HEROKU_APP_NAME>.herokuapp.com:80/
project = <YOUR_PROJECT_NAME>
username = <A_USER_NAME>
password = <A_PASSWORD>

Add a requirements.txt file that includes scrapy-heroku in it. It is strongly recommended that you version pin scrapy-heroku as well as the version of scrapy that your project is developed against (pip freeze > requirements.txt). Finally create a Procfile that consists of:

web: scrapy server

Make sure you have a postgres database that has been promoted to DATABASE_URL

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-heroku-0.7.1.tar.gz (5.2 kB view details)

Uploaded Source

File details

Details for the file scrapy-heroku-0.7.1.tar.gz.

File metadata

File hashes

Hashes for scrapy-heroku-0.7.1.tar.gz
Algorithm Hash digest
SHA256 a4d7d090442b55aaf83cf2957264ed210e869f7419fabb40b625ce47284a9a2b
MD5 7b60d1bc913c4c9bd4192c563d88a53e
BLAKE2b-256 3b19876c22d5971aa48b86f7f25ee8abd6326d48f7abf7bf20f1fb799ece95e1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page