Skip to main content

an elegant datasets factory

Project description

https://img.shields.io/pypi/v/rawbuilder.svg Documentation Status https://app.travis-ci.com/M-Farag/rawbuilder.svg?branch=main https://codecov.io/gh/M-Farag/rawbuilder/branch/main/graph/badge.svg?token=H6YCKETJRV

an elegant datasets factory

Features

  • Schema oriented datasets builder

How to Use it

Terminal:

# Import the package into any python app import rawbuilder as rw

# Init the dataset object as ds ds = rw.DataSet( size=1000, task=’user’, )

# Build the dataset ds.build()

# Optionals ds = rw.DataSet( size=1000, task=’user’, schema_path=’where/to/read/schema/from’, schema_dict=’{‘user’:{‘id’:’int’}}’ )

df = ds.build( output_path=’your/output/directory’, export_csv=True, return_df=True )

Schema

  • The Schema is a JSON object that describes three main components.

  • The model names, the column names, and the data types per column.

  • Note the below code-block, The model name is “Student”, and it contain 4 columns [id,first_name,email,math_test_results].

  • Each property of the model “student” is called a task and it has its columns and data description.

Student data model example:

“student”: { “id”: “int”, “first_name”: “first_name”, “last_name”: “last_name”, “email”: “email”, “math_test_results”: “random_int between,0,30” }

Data types to use in the schema

  • int: build a column of integers between 1 and requested dataset size.

  • decrement: build a column of decremented integers between the requested size and 1.

  • random_int: build a column of random integers between 0 and 100 by default.

  • random_float: build a column of random floats between 0 and 1 by default.

  • first_name: build a column of first names.

  • last_name: build a column of last names.

  • email: build a column of fake emails.

  • password: build a random string passwords with default length of 12 characters.

Data Modifiers

Combine Data Modifiers to the above data types, it can adjust values, change the data nature, and gives more control over the final output.

Modifiers syntax is simple:

“modifier,argument_1,arg_2,arg_*”

Use the modifier between to generate random integer column between 0 and 30:

“math_test_results”: “random_int between,0,30”

All Modifiers

1) Ranges

Use this modifier to set the high-end and low-end for a specific data type

Syntax:

“between,10,1000”

Supported with

random_int:

“math_test_results”: “random_int between,0,30”

random_float:

“heights”: “random_float between,1.30,1.80”

password:

“password”: “password between,12,12”

History

0.0.4 (2021-11-13)

  • Data modifiers

0.0.3 (2021-11-05)

  • Migrate to JSON

  • Generate simple datasets

0.0.2 (2021-11-05)

  • Proof of concept

0.0.1 (2021-10-24)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rawbuilder-0.0.7.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

rawbuilder-0.0.7-py2.py3-none-any.whl (9.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file rawbuilder-0.0.7.tar.gz.

File metadata

  • Download URL: rawbuilder-0.0.7.tar.gz
  • Upload date:
  • Size: 15.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for rawbuilder-0.0.7.tar.gz
Algorithm Hash digest
SHA256 c4f3edf980f6bf149acd1798c4bc3c397a40f2dd64d0f048f4f5dbaf72f2281c
MD5 82fa65ec6c49882d7d0016cd3609faf8
BLAKE2b-256 ce455177a7745a3da2a8f5e7de962086e9fbfef12c69e8a3073b4440b111b517

See more details on using hashes here.

File details

Details for the file rawbuilder-0.0.7-py2.py3-none-any.whl.

File metadata

  • Download URL: rawbuilder-0.0.7-py2.py3-none-any.whl
  • Upload date:
  • Size: 9.5 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for rawbuilder-0.0.7-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 caae66a4185306b0a652f8828cab37738cddf4b0f20a12f17e3b6239bf24ec36
MD5 f78bfb65b678f306e43a95621c4b573f
BLAKE2b-256 03521148a9a42715d19f9b7dd1fe518429892000562c91b44deacbd668fe6580

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page