an elegant datasets factory
Project description
an elegant datasets factory
Free software: MIT license
Documentation: https://rawbuilder.readthedocs.io.
Features
Schema oriented datasets builder
How to Use it
- Terminal:
# Import the package into any python app import rawbuilder as rw
# Init the dataset object as ds ds = rw.DataSet( size=1000, task=’user’, )
# Build the dataset ds.build()
# Optionals ds = rw.DataSet( size=1000, task=’user’, schema_path=’where/to/read/schema/from’, schema_dict=’{‘user’:{‘id’:’int’}}’ )
df = ds.build( output_path=’your/output/directory’, export_csv=True, return_df=True )
Schema
The Schema is a JSON object that describes three main components.
The model names, the column names, and the data types per column.
Note the below code-block, The model name is “Student”, and it contain 4 columns [id,first_name,email,math_test_results].
Each property of the model “student” is called a task and it has its columns and data description.
- Student data model example:
“student”: { “id”: “int”, “first_name”: “first_name”, “last_name”: “last_name”, “email”: “email”, “math_test_results”: “random_int between,0,30” }
Data types to use in the schema
int: build a column of integers between 1 and requested dataset size.
decrement: build a column of decremented integers between the requested size and 1.
random_int: build a column of random integers between 0 and 100 by default.
random_float: build a column of random floats between 0 and 1 by default.
first_name: build a column of first names.
last_name: build a column of last names.
email: build a column of fake emails.
password: build a random string passwords with default length of 12 characters.
Data Modifiers
Combine Data Modifiers to the above data types, it can adjust values, change the data nature, and gives more control over the final output.
- Modifiers syntax is simple:
“modifier,argument_1,arg_2,arg_*”
- Use the modifier between to generate random integer column between 0 and 30:
“math_test_results”: “random_int between,0,30”
All Modifiers
1) Ranges
Use this modifier to set the high-end and low-end for a specific data type
- Syntax:
“between,10,1000”
Supported with
- random_int:
“math_test_results”: “random_int between,0,30”
- random_float:
“heights”: “random_float between,1.30,1.80”
- password:
“password”: “password between,12,12”
History
0.0.4 (2021-11-13)
Data modifiers
0.0.3 (2021-11-05)
Migrate to JSON
Generate simple datasets
0.0.2 (2021-11-05)
Proof of concept
0.0.1 (2021-10-24)
First release on PyPI.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file rawbuilder-0.0.7.tar.gz
.
File metadata
- Download URL: rawbuilder-0.0.7.tar.gz
- Upload date:
- Size: 15.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/0.0.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c4f3edf980f6bf149acd1798c4bc3c397a40f2dd64d0f048f4f5dbaf72f2281c |
|
MD5 | 82fa65ec6c49882d7d0016cd3609faf8 |
|
BLAKE2b-256 | ce455177a7745a3da2a8f5e7de962086e9fbfef12c69e8a3073b4440b111b517 |
File details
Details for the file rawbuilder-0.0.7-py2.py3-none-any.whl
.
File metadata
- Download URL: rawbuilder-0.0.7-py2.py3-none-any.whl
- Upload date:
- Size: 9.5 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/0.0.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | caae66a4185306b0a652f8828cab37738cddf4b0f20a12f17e3b6239bf24ec36 |
|
MD5 | f78bfb65b678f306e43a95621c4b573f |
|
BLAKE2b-256 | 03521148a9a42715d19f9b7dd1fe518429892000562c91b44deacbd668fe6580 |