Get clean datasets from DataHerb to boost your data science and data analysis projects
Project description
The Python Package for DataHerb
A DataHerb Core Service to Create and Load Datasets.
Install
pip install dataherb
Usage
Load Data into DataFrame
# Load the package
from dataherb.flora import Flora
# Initialize Flora service
# The Flora service holds all the dataset metadata
dataherb = Flora()
# Search datasets with keyword(s)
geo_datasets = dataherb.search("geo")
print(geo_datasets)
# Get a specific file from a dataset and load as DataFrame
tz_df = dataherb.herb(
"geonames_timezone"
).leaves.get(
"dataset/geonames_timezone.csv"
).data
print(tz_df)
Create Dataset Using Command Line Tool
We provide a template for dataset creation.
Before creating a dataset, it is recommended that the user reads the intro.
Use the following command line tool to create the metadata template.
dataherb create
Understanding DataHerb
What is DataHerb
DataHerb is an open data initiative to make the access of open datasets easier.
- A DataHerb or Herb is a dataset. A dataset comes with the data files, and the metadata of the data files.
- A DataHerb Leaf or Leaf is a data file in the DataHerb.
- A Flora is the combination of all the DataHerbs.
In many data projects, finding the right datasets to enhance your data is one of the most time consuming part. DataHerb adds flavor to your data project.
What is DataHerb Flora
We desigined the following workflow to share and index datasets.
This repository is being used for listing of datasets (Listings in DataHerb flora repository).
How to Add Your Dataset
Simply create a yml
file in the flora
folder to link to your dataset repository. Your dataset repository should have a .dataherb
folder and a metadata.yml
file in it.
The indexing part will be done by GitHub Actions.
Development
- Create a conda environment.
- Install requirements:
pip install -r requirements.txt
Documentation
The documentation for this package is located at docs
.
HISTORY.rst
is used to list changes of the package.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dataherb-0.0.4.tar.gz
.
File metadata
- Download URL: dataherb-0.0.4.tar.gz
- Upload date:
- Size: 10.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0.post20200209 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5cbb4134a441a4d6190cc5dbf07eba5878ec3b3e7510d98a641f86662209bc81 |
|
MD5 | c49056040b857ade1594dc4f9a9fbe16 |
|
BLAKE2b-256 | 307fea51442660a68e7681f25aa8c30278de57e0bfbab8adafa3641c303faf49 |
File details
Details for the file dataherb-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: dataherb-0.0.4-py3-none-any.whl
- Upload date:
- Size: 14.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0.post20200209 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe9e2a9ba0f32fa12ba09cada7fcf388ddbb65e8caef05077877d8bb2f23ca61 |
|
MD5 | 62a86762f4b12b42f445c8a636fd0f7f |
|
BLAKE2b-256 | c54af59d13467f773f70fcf78a5f53ff99b795f2909b87544d017b935878f473 |