Skip to main content
Help us improve PyPI by participating in user testing. All experience levels needed!

Generate realistic raw datasets with optional DQ issues

Project description

Build Status Coverage Code Health Code Climate Requirements Status

Generate realistic raw datasets with optional DQ issues

To install run

pip install rawdata

Basic Usage

Create a random table

import rawdata.generate
colLabel = ['Year', 'Name',   'Born', 'Details' , 'Amount']
colTypes = ['DATE', 'PEOPLE', 'PLACE', 'WORD',    'CURRENCY']
tbl = rawdata.generate.TableGenerator(3, colTypes, colLabel)
print(tbl)

> Year, name,    Age, Born,         Details,      Amount
> 2013, Douglas, 34,  Scandinavia,  Bowling Ball, $34.95
> 1999, Hunter,  65,  Sierra Leone, Fish,         12.00
> 2005, Shubha,  18,  Madagascar,   screenplay,   -$231.00

Adding Errors to a table

import rawdata.errors
t = rawdata.errors.TableWithErrors(tbl, 'BAD_STRING')
t.add_errors(3)
print(t.tbl)

And after adding 3 random errors there are additional spaces in Douglas, a fake string in Douglas Born column, and the Born column is missing for Hunter

Year    Name       Born
-----   ---------  ----------
2013     Douglas   BAD_STRING
1999    Hunter
2005    Shubha     Madagascar

You can use columns generated via a custom list

custom_list = ['Carved Statue', '1984 Volvo', '2 metre Ball of string']
tbl = TableGenerator(5, ['PEOPLE', 'INT', custom_list], ['Name', 'Age', 'Fav Possession'])
print(tbl)
    > Name,   Age,  Fav Possession
    > Inez,    58,  Carved Statue
    > Zane,    50,  2 metre Ball of string
    > Jered,   49,  1984 Volvo
    > Tameron, 55,  2 metre Ball of string
    > Wyatt,   68,  Carved Statue

Other functions

import rawdata.generate
n = rawdata.generate.NumberGenerator
s = rawdata.generate.StringGenerator

print('Random Number    = ', n.random_int(1,100))
    > Random Number    =  84

print('Random Letters   = ', s.random_letters(40))
    > Random Letters   =  T1CElkRAGPAmWSavbDItDbFmQIvUh26SyJE58x49

print('Random Password  = ', s.generate_password())
    > Random Password  =  peujlsmbf19966YKCX

words = rawdata.generate.get_list_words()
print(len(words), ' words : ', words[500:502])
    > 10739  words :  ['architeuthis', 'arcsine']

places = rawdata.generate.get_list_places()
print(len(places), ' places : ', places[58:60])
    > 262  places :  ['Brazil', 'British Virgin Islands']

List of Column Types (Table Generator)

'INT'      - returns a number
'CURRENCY' - returns a currency that may have strings $ / pounds
'STRING'   - returns a random string
'WORD'     - returns a word from nouns.csv
'DATE'     - returns a date
'YEAR'     - returns a year. Both year and date can have ranges set via set_range()
'PLACE'    - returns a location from country.csv
'PEOPLE'   - returns a name from names.csv
[list]     - pass any list to return a random choice from it
                (e.g. my_colours = ['Blue', 'Green', 'Orange'] )

More information is at https://github.com/acutesoftware/rawdata

Project details


Release history Release notifications

This version
History Node

0.1.0

History Node

0.0.9

History Node

0.0.8

History Node

0.0.7

History Node

0.0.7b

History Node

0.0.6

History Node

0.0.5

History Node

0.0.5c

History Node

0.0.5b

History Node

0.0.4

History Node

0.0.4c

History Node

0.0.4b

History Node

0.0.3

History Node

0.0.2

History Node

0.0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
rawdata-0.1.0.zip (846.3 kB) Copy SHA256 hash SHA256 Source None Oct 9, 2016

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page