This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (pypi.python.org).
Help us improve Python packaging - Donate today!
Project Description

DataBrewer

The missing datasets manager.

Databrewer let you search and discover datasets. Inspired by Homebrew, it creates and index of known datasets that you can download with a single command. It will provide an API to allow to do the same in, for example, a IPython notebook so you no longer have to manually download datasets.

Quickstart

Install databrewer:

pip install databrewer

Update the recipes index:

databrewer update

Search for some keywords:

databrewer search nyc taxi

Example output:

andresmh-nyc-taxi-trips - NYC Taxi Trips. Data obtained through a FOIA request
nyc-tlc-taxi            - This dataset includes trip records from all trips
                          completed in yellow and green taxis in NYC in 2014 and
                                                    select months of 2015.

Let’s check the nyc-tlc-taxi dataset:

databrewer info nyc-tlc-taxi

We can either download the entire dataset (which is huge!):

databrewer download nyc-tlc-taxi

Or just a few files in the dataset, or select a subset:

databrewer download "nyc-tlc-taxi[green][2014-*]"

Note

Note that * is the standard glob operator and [green] acts as selector. The selectors depends on how the recipe if defined. When using selectors you must enclose the name in quotes in most shells.

Finally you need to know where the files are located for further processing:

databrewer download "nyc-tlc-taxi[green][2014-*]"

Example output:

/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-01.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-02.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-03.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-04.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-05.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-06.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-07.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-08.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-09.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-10.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-11.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-12.csv

Datasets

The aim is to index known and not-so-known datasets. There is no plans to standarize the dataset format as we want to keep it as published by the authors.

Recipes

Datasets are defined in recipes which contains information about the dataset and where to find it.

These recipes are community maintained and hosted in the databrewer-recipes repository.

Roadmap

  • Include an API. For now it only provides a CLI-interface but in the near future it will include an API so you can search, download and load datasets directly in your Python code.

Contributing

You can help by the following means:

See CONTRIBUTING.rst for more information.

History

0.1.1 (2017-05-05)

Fix packaging issues.

0.1.0 (2017-05-05)

  • First release on PyPI.
Release History

Release History

0.1.1

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.0.dev1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.0.dev0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
databrewer-0.1.1-py2.py3-none-any.whl (14.1 kB) Copy SHA256 Checksum SHA256 py2.py3 Wheel May 5, 2017
databrewer-0.1.1.tar.gz (22.0 kB) Copy SHA256 Checksum SHA256 Source May 5, 2017

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting