The missing datasets manager.
Project description
DataBrewer
The missing datasets manager.
Free software: MIT license
Documentation: https://databrewer.readthedocs.org.
Databrewer let you search and discover datasets. Inspired by Homebrew, it creates and index of known datasets that you can download with a single command. It will provide an API to allow to do the same in, for example, a IPython notebook so you no longer have to manually download datasets.
Quickstart
Install databrewer:
pip install databrewer
Update the recipes index:
databrewer update
Search for some keywords:
databrewer search nyc taxi
Example output:
andresmh-nyc-taxi-trips - NYC Taxi Trips. Data obtained through a FOIA request nyc-tlc-taxi - This dataset includes trip records from all trips completed in yellow and green taxis in NYC in 2014 and select months of 2015.
Let’s check the nyc-tlc-taxi dataset:
databrewer info nyc-tlc-taxi
We can either download the entire dataset (which is huge!):
databrewer download nyc-tlc-taxi
Or just a few files in the dataset, or select a subset:
databrewer download "nyc-tlc-taxi[green][2014-*]"
Finally you need to know where the files are located for further processing:
databrewer download "nyc-tlc-taxi[green][2014-*]"
Example output:
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-01.csv /Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-02.csv /Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-03.csv /Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-04.csv /Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-05.csv /Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-06.csv /Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-07.csv /Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-08.csv /Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-09.csv /Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-10.csv /Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-11.csv /Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-12.csv
Datasets
The aim is to index known and not-so-known datasets. There is no plans to standarize the dataset format as we want to keep it as published by the authors.
Recipes
Datasets are defined in recipes which contains information about the dataset and where to find it.
These recipes are community maintained and hosted in the databrewer-recipes repository.
Roadmap
Include an API. For now it only provides a CLI-interface but in the near future it will include an API so you can search, download and load datasets directly in your Python code.
Contributing
You can help by the following means:
See CONTRIBUTING.rst for more information.
History
0.1.1 (2017-05-05)
Fix packaging issues.
0.1.0 (2017-05-05)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for databrewer-0.1.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c0f869e29dd28069883847a84409ebfc3ee206e062358f923d4ca18a8b6b3776 |
|
MD5 | b32589b16d1303a94c982403b454d2eb |
|
BLAKE2b-256 | a945981aaa7f3551d67f1c27009cbb8567f7e7025b65b35c256dc67a63c8c359 |