Skip to main content

A python package to create fake data with relationships between tables.

Project description

logo

test_with_pytest Documentation Status

PyDataFaker is a python package to create fake data with relationships between tables. Creating fake data can be useful for many different applications such as creating product demos or testing software.

Python already has a great package for creating fake data called Faker https://faker.readthedocs.io/en/master/. Faker is great for creating individual fake units of data, but it can be time consuming to create more complicated fake data that is actually related to one another.

Imagine you are developing a new enterprise resource planning (ERP) software to challenge SAP. You may need to create some fake data to test your application. You will need an invoice table, a vendor listing, purchase order table, and more. PyDataFaker allows your to quickly create these tables and generates relationships between them!

PyDataFaker is currently under development. At this time it is possible to create the following entities:

  • Business: create a fake business with common ERP like tables
  • School: create a fake school

More entities are currently being developed. If you have any ideas of additional entities that should be included please submit an issue here: https://github.com/SamEdwardes/pydatafaker/issues.

Table of contents

Installation

pip install pydatafaker

Documentation

Documentation can be found at https://pydatafaker.readthedocs.io/en/latest/index.html. The package is distributed through PyPi at https://pypi.org/project/pydatafaker/

Usage

Business

The business module allows you to create fake business data. Calling business.create_business() will return a dictionary of related tables.

import pandas as pd
from pydatafaker import business
biz =  business.create_business()
biz.keys()
dict_keys(['vendor_table', 'po_table', 'invoice_summary_table', 'invoice_line_item_table', 'employee_table', 'contract_table', 'rate_sheet_table', 'timesheet_table'])

Each value inside the dictionary contains a Pandas DataFrame.

biz['invoice_summary_table']
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
invoice_id amount invoice_date po_id vendor_id
0 inv_00001 59157 2011-01-20 po_00001 vendor_00001
1 inv_00002 87796 2007-09-06 po_00002 vendor_00002
2 inv_00003 57963 2000-03-06 po_00003 vendor_00003
3 inv_00004 59409 2001-03-31 po_00004 vendor_00004
4 inv_00005 86614 2002-01-12 po_00005 vendor_00005
... ... ... ... ... ...
445 inv_00446 83316 2012-09-02 po_00087 vendor_00087
446 inv_00447 45707 2008-07-10 po_00101 vendor_00098
447 inv_00448 111932 2002-09-26 po_00158 vendor_00012
448 inv_00449 35104 2012-09-21 po_00133 vendor_00075
449 inv_00450 15397 2015-12-15 po_00054 vendor_00054

450 rows × 5 columns

Tables can be joined together to add additional details.

invoice_summary = biz['invoice_summary_table']
vendors = biz['vendor_table']
pd.merge(invoice_summary, vendors, how='left', on='vendor_id')
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
invoice_id amount invoice_date po_id vendor_id vendor_name vendor_description address phone email
0 inv_00001 59157 2011-01-20 po_00001 vendor_00001 Smith-Scott Front-line multimedia emulation 75343 Harper Corners Suite 581\nJuanberg, AK 0... (193)898-1652x129 ftodd@example.org
1 inv_00002 87796 2007-09-06 po_00002 vendor_00002 Walker-Morgan Cross-platform radical solution 941 Susan Isle\nThorntonberg, KS 82841 +1-636-744-9620x3991 rdunn@example.com
2 inv_00003 57963 2000-03-06 po_00003 vendor_00003 Noble and Sons Configurable demand-driven emulation 1442 Jason Rapid Apt. 409\nEast Jade, RI 44983 477-214-2021x973 tinaschmidt@example.com
3 inv_00004 59409 2001-03-31 po_00004 vendor_00004 Baker, Walker and Davenport Focused analyzing synergy 89120 Kimberly Extensions\nSouth Annettetown, ... (643)621-7544x290 sarahstephenson@example.com
4 inv_00005 86614 2002-01-12 po_00005 vendor_00005 Patterson LLC Profound maximized productivity 880 Bryan Tunnel Apt. 542\nKaylabury, AK 50221 586-422-7311x0127 littleyesenia@example.net
... ... ... ... ... ... ... ... ... ... ...
445 inv_00446 83316 2012-09-02 po_00087 vendor_00087 Wagner-Gutierrez Multi-lateral motivating projection 8771 Roger Road Suite 781\nDanielton, ID 88428 001-023-820-3050x78454 colliernicole@example.net
446 inv_00447 45707 2008-07-10 po_00101 vendor_00098 Simmons-Leonard Focused reciprocal secured line 9010 Ashley Mountains\nMarthaton, VT 68298 391-162-6024 serranonancy@example.org
447 inv_00448 111932 2002-09-26 po_00158 vendor_00012 Welch LLC Versatile methodical interface 4016 Brianna Road\nPort Andrealand, AR 22214 +1-837-862-5571x172 williamoliver@example.com
448 inv_00449 35104 2012-09-21 po_00133 vendor_00075 Franklin-Bennett Digitized holistic methodology 68125 Vega Plains Apt. 062\nEast Emily, OK 80097 001-979-468-2358x530 leroymoore@example.org
449 inv_00450 15397 2015-12-15 po_00054 vendor_00054 Barton-Oneill Mandatory 4thgeneration hierarchy 107 Julie Passage Suite 904\nSouth George, OH ... (491)397-7771x41615 jacksonrachel@example.com

450 rows × 10 columns

School

import pandas as pd
from pydatafaker import school
skool =  school.create_school()
skool.keys()
skool['student_table']
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
student_id name grade teacher_id
0 student_0001 Tyler Campbell 1 teacher_0007
1 student_0003 Melissa Coleman 1 teacher_0010
2 student_0011 Crystal Church 1 teacher_0014
3 student_0017 Paul Gray 1 teacher_0007
4 student_0023 Joshua Morales 1 teacher_0010
... ... ... ... ...
31 student_0258 Nicole Hoffman 7 teacher_0015
32 student_0261 Joseph Lewis 7 teacher_0009
33 student_0294 Susan Jacobs 7 teacher_0015
34 student_0299 Mark Whitehead 7 teacher_0009
35 student_0300 Melissa Sosa 7 teacher_0015

300 rows × 4 columns

Contributing

Please see docs/source/contributing.rst.

Credits

Developed by:

  • Sam Edwardes

Logo:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydatafaker-0.1.2.tar.gz (16.3 kB view hashes)

Uploaded Source

Built Distribution

pydatafaker-0.1.2-py3-none-any.whl (13.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page