Skip to main content

a tool for quickly generating dummy data

Project description

DataBuilder

Have you ever needed some dummy data to demonstrate some basic data analysis / machine learning topics?

DataBuilder can save you time by creating customized dummy data sets within minutes.


Installation

pip install databuilder

Quick Example

import databuilder as db

# make a dummy dataset about "our employees"
config = {
    'fields': {
        'empID':        db.ID(),
        'first_name':   db.Name(first_only=True),
        'last_name':    db.Name(last_only=True),
        'department':   db.Group(["Sales", "Acct", "Mktg", "IT"]),
        'salary':       db.NormalDist(50000, 10000),
        'hire_date':    db.Date("1990-01-01", "2020-12-31")
    }
}

# create a Pandas DataFrame with 
# the fields defined in `config`
df = db.create_df(config, n=200)

print(df.head(2))
#
#   Example output:
#         empID first_name last_name department  salary  hire_date
#      0      1      Frank      Ward         IT   69210 2004-05-05
#      1      2    Barbara    George       Mktg   46744 2019-05-20

Complete Usage Guide

Detailed docs on how to use DataBuilder can be found in the docs/ folder of this repo (or click here)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databuilder-0.0.2.tar.gz (7.9 kB view hashes)

Uploaded Source

Built Distribution

databuilder-0.0.2-py3-none-any.whl (8.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page