Skip to main content

The missing datasets manager.

Project description

DataBrewer

Documentation Status https://img.shields.io/pypi/v/databrewer.svg https://img.shields.io/travis/rolando/databrewer.svg Coverage Status Code Quality Status Requirements Status

The missing datasets manager.

DataBrewer preview

Databrewer let you search and discover datasets. Inspired by Homebrew, it creates and index of known datasets that you can download with a single command. It will provide an API to allow to do the same in, for example, a IPython notebook so you no longer have to manually download datasets.

Quickstart

Install databrewer:

pip install databrewer

Update the recipes index:

databrewer update

Search for some keywords:

databrewer search nyc taxi

Example output:

andresmh-nyc-taxi-trips - NYC Taxi Trips. Data obtained through a FOIA request
nyc-tlc-taxi            - This dataset includes trip records from all trips
                          completed in yellow and green taxis in NYC in 2014 and
                                                    select months of 2015.

Let’s check the nyc-tlc-taxi dataset:

databrewer info nyc-tlc-taxi

We can either download the entire dataset (which is huge!):

databrewer download nyc-tlc-taxi

Or just a few files in the dataset, or select a subset:

databrewer download "nyc-tlc-taxi[green][2014-*]"

Finally you need to know where the files are located for further processing:

databrewer download "nyc-tlc-taxi[green][2014-*]"

Example output:

/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-01.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-02.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-03.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-04.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-05.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-06.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-07.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-08.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-09.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-10.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-11.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-12.csv

Datasets

The aim is to index known and not-so-known datasets. There is no plans to standarize the dataset format as we want to keep it as published by the authors.

Recipes

Datasets are defined in recipes which contains information about the dataset and where to find it.

These recipes are community maintained and hosted in the databrewer-recipes repository.

Roadmap

  • Include an API. For now it only provides a CLI-interface but in the near future it will include an API so you can search, download and load datasets directly in your Python code.

Contributing

You can help by the following means:

See CONTRIBUTING.rst for more information.

History

0.1.1 (2017-05-05)

Fix packaging issues.

0.1.0 (2017-05-05)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databrewer-0.1.1.tar.gz (22.0 kB view details)

Uploaded Source

Built Distribution

databrewer-0.1.1-py2.py3-none-any.whl (14.1 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file databrewer-0.1.1.tar.gz.

File metadata

  • Download URL: databrewer-0.1.1.tar.gz
  • Upload date:
  • Size: 22.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for databrewer-0.1.1.tar.gz
Algorithm Hash digest
SHA256 476813afcd2c70818a3c76fba5e00acd139a4d66764e4249bd4660ffb43ef801
MD5 a9bade41c58f64c6c94cb308c4be320c
BLAKE2b-256 64f8e2304ddd7e1c3ad0c2b1c4b90146286e639cab9ee65c810f8d4436f05c8b

See more details on using hashes here.

File details

Details for the file databrewer-0.1.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for databrewer-0.1.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 c0f869e29dd28069883847a84409ebfc3ee206e062358f923d4ca18a8b6b3776
MD5 b32589b16d1303a94c982403b454d2eb
BLAKE2b-256 a945981aaa7f3551d67f1c27009cbb8567f7e7025b65b35c256dc67a63c8c359

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page