Relative Data Generator: generate relative tables data, data generator for multi tables that depend on each other
Project description
Fakeme
Data Generator for Chained and Relative Data
Documentation in process: https://fakeme.readthedocs.io/en/latest/
How to use
pip install fakeme
Check examples: https://github.com/xnuinside/fakeme/tree/master/examples
What is Fakeme?
Fakeme is a tools that try to understand your data based on schemas & fields name and generate data relative to expected.
It create dependencies graph and generate relative data.
Fakeme oriented on generation data that depend on values in another tables/datasets. Data, that knitted together as real.
Fakeme can help you if you want to generate several tables, that must contains in columns values, that you will use like key for join.
For example, user_data table has field user_id and users table contains list of users with column id. You want join it on user_id = id.
Fakeme will generate for you 2 tables with same values in those 2 columns.
It does not matter to have columns with same name you can define dependencies between tables with alias names.
TODO in next releases:
Add integration with simple-ddl-parser to generated data from different SQL dialects
Add integration with py-models-parser to generated data from different Python models
Fix cases in todo folder
Improve test coverage
What you can to do
Define that fields in your datasets must contain similar values
You can set up how much values must intersect, for example, you want to emulate data for email validation pipeline - you have one dataset with incoming messages and you need to find all emails that was not added previously in your contacts table.
So you will have incoming messages table, that contains, for example only 70% of emails that exist in contacts table.
You can use multiply columns as a key (dependency) in another column, for example, player_final_report must contains for each player same values as in other tables, for example, you have player table with players details and in_game_player_activity with all player activities for some test reasons it’s critical to you generate player_final_report with 1-to-1 data from other 2 tables.
Union tables. You can generate tables that contains all rows from another tables.
You can define your own generator for fields on Python.
You can define your own output format
Examples
You can find usage examples in ‘fakeme/examples/’ folder.
Example from fakeme/examples/generate_data_related_to_existed_files:
from fakeme import Fakeme
# to use some existed data you should provide with_data argument -
# and put inside list with the paths to the file with data
# data file must be in same format as .json or csv output of fakeme.
# so it must be [{"column_name": value, "column_name2": value2 ..},
# {"column_name" : value, "column_name2": value2 ..}..]
# Please check example in countries.json
cities_schema = [{"name": "name"},
{"name": "country_id"},
{"name": "id"}]
# all fields are strings - so I don't define type, because String will be used as default type for the column
tables_list = [('cities', cities_schema)]
Fakeme(
tables=tables_list,
with_data=['countries.json'],
rls={'cities': { # this mean: for table 'cities'
'country_id': { # and column 'country_id' in table 'cities'
'table': 'countries.json', # please take data from data in countries.json
'alias': 'id', # with alias name 'id'
'matches': 1 # and I want all values in country_id must be from countries.id column, all.
}
}},
settings={'row_numbers': 1300} # we want to have several cities for each country,
# so we need to have more rows,
).run()
# run and you will see a list of cities, that generated with same ids as in countries.json
Docs: https://fakeme.readthedocs.io/en/latest/
Changelog
v0.2.2
Fixes:
generate_data_related_to_existed_files example now works well (generation data based on already existing files).
Added integration tests to run examples
Examples are cleaned up, unworking samples moved to ‘todo’
v0.2.1
Now you can define tables as Table class object if it will be more easily for you.
from fakeme import Table
Table(name='table_name_example', schema='path/to/schema.json')
# or
user_schema = [{'name': 'id'},
{'name': 'title'},
{'name': 'rights', 'type': 'list', 'alias': 'right_id'},
{'name': 'description'}]
Table(name='table_name_example', schema=user_schema)
samples it tests: tests/unittests/test_core.py
Relationships between tables was corrected
v0.1.0
Added code changes to support Python 3.8 and upper (relative to changes in python multiprocessing module)
Added tests runner on GitHub
Autoaliasing was fixed
Added some unit tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fakeme-0.2.2.tar.gz
.
File metadata
- Download URL: fakeme-0.2.2.tar.gz
- Upload date:
- Size: 26.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.4 CPython/3.8.11 Darwin/19.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 050556a6700323cff373a4b07402b68963aa698331ff0fedf40b3590790c94e6 |
|
MD5 | dcdf52446d2661436b3d4606dd0e6bb0 |
|
BLAKE2b-256 | 9825ae90e49da19c12952a7f3e874869b0becef9946557b6ac46d5394a4e5d91 |
File details
Details for the file fakeme-0.2.2-py3-none-any.whl
.
File metadata
- Download URL: fakeme-0.2.2-py3-none-any.whl
- Upload date:
- Size: 26.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.4 CPython/3.8.11 Darwin/19.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a41637633260a0c5ddedac5abdada4a6c7b65b79796beba9be98ed448043ff4 |
|
MD5 | 8e058ccf7d5ab1191d3a7444a20fccdd |
|
BLAKE2b-256 | fe6fb716b1112b90c90b123ae9e0eb78df05b93aa311dcd23201f69c9c670d7f |