Skip to main content

Generate realistic raw datasets with optional DQ issues

Project description

Build Status Coverage Code Health Code Climate Requirements Status

Generate realistic raw datasets with optional DQ issues

To install run

pip install rawdata

Basic Usage

Create a random table

import rawdata.generate
colLabel = ['Year', 'Name',   'Born', 'Details' , 'Amount']
colTypes = ['DATE', 'PEOPLE', 'PLACE', 'WORD',    'CURRENCY']
tbl = rawdata.generate.TableGenerator(3, colTypes, colLabel)

> Year, name,    Age, Born,         Details,      Amount
> 2013, Douglas, 34,  Scandinavia,  Bowling Ball, $34.95
> 1999, Hunter,  65,  Sierra Leone, Fish,         12.00
> 2005, Shubha,  18,  Madagascar,   screenplay,   -$231.00

Adding Errors to a table

import rawdata.errors
t = rawdata.errors.TableWithErrors(tbl, 'BAD_STRING')

And after adding 3 random errors there are additional spaces in Douglas, a fake string in Douglas Born column, and the Born column is missing for Hunter

Year    Name       Born
-----   ---------  ----------
2013     Douglas   BAD_STRING
1999    Hunter
2005    Shubha     Madagascar

You can use columns generated via a custom list

custom_list = ['Carved Statue', '1984 Volvo', '2 metre Ball of string']
tbl = TableGenerator(5, ['PEOPLE', 'INT', custom_list], ['Name', 'Age', 'Fav Possession'])
    > Name,   Age,  Fav Possession
    > Inez,    58,  Carved Statue
    > Zane,    50,  2 metre Ball of string
    > Jered,   49,  1984 Volvo
    > Tameron, 55,  2 metre Ball of string
    > Wyatt,   68,  Carved Statue

Other functions

import rawdata.generate
n = rawdata.generate.NumberGenerator
s = rawdata.generate.StringGenerator

print('Random Number    = ', n.random_int(1,100))
    > Random Number    =  84

print('Random Letters   = ', s.random_letters(40))
    > Random Letters   =  T1CElkRAGPAmWSavbDItDbFmQIvUh26SyJE58x49

print('Random Password  = ', s.generate_password())
    > Random Password  =  peujlsmbf19966YKCX

words = rawdata.generate.get_list_words()
print(len(words), ' words : ', words[500:502])
    > 10739  words :  ['architeuthis', 'arcsine']

places = rawdata.generate.get_list_places()
print(len(places), ' places : ', places[58:60])
    > 262  places :  ['Brazil', 'British Virgin Islands']

List of Column Types (Table Generator)

'INT'      - returns a number
'CURRENCY' - returns a currency that may have strings $ / pounds
'STRING'   - returns a random string
'WORD'     - returns a word from nouns.csv
'DATE'     - returns a date
'YEAR'     - returns a year. Both year and date can have ranges set via set_range()
'PLACE'    - returns a location from country.csv
'PEOPLE'   - returns a name from names.csv
[list]     - pass any list to return a random choice from it
                (e.g. my_colours = ['Blue', 'Green', 'Orange'] )

More information is at

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution (846.3 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page