Skip to main content

Package to generate random transactional data

Project description

RandomDataGen - Random Data Generator Package

Code style: black Checked with mypy Downloads

This is a package to generate random transactional data. You can use this package to study Pandas operations or clustering methods like RFM.

With this package you can create a table with transactional data containing:

  • consumer_id: ID identifying the customer that does the transaction;
  • transaction_created_at: Date of transaction;
  • transaction_payment_value: Monetary value of transaction.

All the fields are customizable.

How the data is generated

The consumer_id field is generated by a range function, returning a sequence of integers from 1 to n_consumers:

consumer_ids = range(1, n_consumers + 1)

The transaction_created_at field is generated by a Pandas function called date_range. You can view more about this functions in this link:

created_at_list = list(pd.date_range(start=first_transaction_date, end=last_transaction_date, periods=n_rows)

The transaction_payment_value is sample from a normal distribution with mean equals the mean_spend parameter and the stardand deviation equals the std_spend parameter:

list(np.random.normal(transaction_mean_value, transaction_std_value, n_rows))

How to use

You can start the use of RandomDataGen with this example code:

from random_data_gen.data_generator import TransactionalDataGenerator

TRGenerator = TransactionalDataGenerator(
    n_rows=1000,
    n_consumers=100,
    transaction_mean_value=100,
    transaction_std_value=10,
    first_transaction_date="2020-01-01",
    last_transaction_date="2021-01-01",
)

df = TRGenerator()

In this snippet we defined a dataframe with 1000 rows, 100 unique users, a mean spend in transactions of 100u.m., a standard deviation in transactional spend of 10u.m., the first transaction date (2020-01-01) and the last transaction date (2021-01-01).

The dataframe returned is in the form:

| consumer_id |     transaction_created_at    | transaction_payment_value |
|:-----------:|:-----------------------------:|:-------------------------:|
|     234     | 2020-01-01 00:00:00.000000000 |           120.10          |
|      43     | 2020-01-01 08:47:34.054054054 |           87.10           |
|     321     | 2021-10-23 10:27:12.092356134 |           12.98           |
|     3123    | 2020-12-30 21:37:17.837837840 |           12.84           |

The shape of this dataframe is defined by the parameter n_rows.

Contribute

To contribute you need to install Poetry.

After installing, you need to clone this repo and run the following command:

poetry install -n

Before sending the code to the repo, you need to run:

make format

To apply the project style to the new code.

And after that, run:

make check

This command will check your code with flake8 and pytest.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

random-data-gen-0.1.3.tar.gz (3.6 kB view hashes)

Uploaded Source

Built Distribution

random_data_gen-0.1.3-py3-none-any.whl (3.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page