Skip to main content
This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (
Help us improve Python packaging - Donate today!

Periodically capture external data relating to GitHub hosted Open Source libraries

Project Description

Quickly and easily store data about your open source projects on GitHub and various Package Managers.


All updates to this project is documented in our CHANGELOG.


Environment Variables

First, get your free SendGrid account here.

Next, update your environment with your SENDGRID_API_KEY.

Initial Setup

echo "export SENDGRID_API_KEY='YOUR_API_KEY'" > sendgrid.env
echo "sendgrid.env" >> .gitignore
source ./sendgrid.env
git clone
cd sendgrid-open-source-library-external-data
virtualenv venv
cp .env_sample .env

Update your settings in .env

mysql -u USERNAME -p -e "CREATE DATABASE IF NOT EXISTS open-source-library-data-collector";
mysql -u USERNAME -p open-source-external-library-data < db/data_schema.sql
cp config_sample.yml config.yml

Update the settings in config.yml

source venv/bin/activate
pip install -r requirements.txt

Update the code in The functions update_package_manager_data and update_db was customized for our particular needs. You will want to either subclass those functions in your own application or modify it to suit your needs. We will remove these customizations in a future release. Here is the GitHub issue for reference.

To run:

source venv/bin/activate


Heroku Deploy

heroku login
heroku create
heroku addons:create cleardb:ignite

Access the cleardb DB and create the tables in db/data_schema.sql

heroku config:add ENV=prod
heroku config:add GITHUB_TOKEN=<<your_github_token>>
heroku config:add SENDGRID_API_KEY=<<your_sendgrid_api_key>>
heroku addons:create scheduler:standard

Configure the schedular addon in your Heroku dashboard to run python at your desired frequency.

Test by running heroku run worker


If you are interested in the future direction of this project, please take a look at our milestones. We would love to hear your feedback.


open-source-library-data-collector is guided and supported by the SendGrid Developer Experience Team.

open-source-library-data-collector is maintained and funded by SendGrid, Inc. The names and logos for open-source-library-data-collector are trademarks of SendGrid, Inc.

Release History

Release History

This version
History Node


History Node


History Node


Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
open_source_library_data_collector-1.1.0-py2-none-any.whl (7.2 kB) Copy SHA256 Checksum SHA256 2.7 Wheel Oct 12, 2016
open_source_library_data_collector-1.1.0.tar.gz (4.5 kB) Copy SHA256 Checksum SHA256 Source Oct 12, 2016

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting