Skip to main content

Get clean datasets from DataHerb to boost your data science and data analysis projects

Project description


Markdownify
The Python Package for DataHerb

A DataHerb Core Service to Create and Load Datasets.

Install

pip install dataherb

Usage

Load Data into DataFrame

# Load the package
from dataherb.flora import Flora

# Initialize Flora service
# The Flora service holds all the dataset metadata
dataherb = Flora()

# Search datasets with keyword(s)
geo_datasets = dataherb.search("geo")
print(geo_datasets)

# Get a specific file from a dataset and load as DataFrame
tz_df = dataherb.herb(
    "geonames_timezone"
).leaves.get(
    "dataset/geonames_timezone.csv"
).data
print(tz_df)

Create Dataset Using Command Line Tool

We provide a template for dataset creation.

Before creating a dataset, it is recommended that the user reads the intro.

Use the following command line tool to create the metadata template.

dataherb create

Understanding DataHerb

What is DataHerb

DataHerb is an open data initiative to make the access of open datasets easier.

  • A DataHerb or Herb is a dataset. A dataset comes with the data files, and the metadata of the data files.
  • A DataHerb Leaf or Leaf is a data file in the DataHerb.
  • A Flora is the combination of all the DataHerbs.

In many data projects, finding the right datasets to enhance your data is one of the most time consuming part. DataHerb adds flavor to your data project.

What is DataHerb Flora

We desigined the following workflow to share and index datasets.

DataHerb Workflow

This repository is being used for listing of datasets (Listings in DataHerb flora repository).

How to Add Your Dataset

A Complete Tutorals

Simply create a yml file in the flora folder to link to your dataset repository. Your dataset repository should have a .dataherb folder and a metadata.yml file in it.

The indexing part will be done by GitHub Actions.

Development

  1. Create a conda environment.
  2. Install requirements: pip install -r requirements.txt

Documentation

The documentation for this package is located at docs.

HISTORY.rst is used to list changes of the package.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataherb-0.0.5.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

dataherb-0.0.5-py3-none-any.whl (14.8 kB view details)

Uploaded Python 3

File details

Details for the file dataherb-0.0.5.tar.gz.

File metadata

  • Download URL: dataherb-0.0.5.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0.post20200209 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for dataherb-0.0.5.tar.gz
Algorithm Hash digest
SHA256 bfc880c1092beafb477e84447836ea72ebb3dfe22c682262507dd30dadb190b2
MD5 a9fdb97eb818d769cbe60c0a2df3ae14
BLAKE2b-256 d8fbcc4b4cbdeee2b120b4c10f7a7e7edd32b499fdff4c403dcc223d711d1c22

See more details on using hashes here.

File details

Details for the file dataherb-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: dataherb-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 14.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0.post20200209 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for dataherb-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 47df99543fce6f8e5cd59fb62d3420aa20087f319c9c64c65e0fd88ba87af4ba
MD5 1e320e6f1c8f2818c389f8231221906b
BLAKE2b-256 1da80a447efed46a700acdf85ff56e7050ca90f6c193c89f6f2d5eb2f37bdc9a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page