Package to generate random transactional data
Project description
RandomDataGen - Random Data Generator Package
This is a package to generate random transactional data. You can use this package to study Pandas operations or clustering methods like RFM.
With this package you can create a table with transactional data containing:
- consumer_id: ID identifying the customer that does the transaction;
- transaction_created_at: Date of transaction;
- transaction_payment_value: Monetary value of transaction.
All the fields are customizable.
How the data is generated
The consumer_id field is generated by a range function, returning a sequence of integers from 1 to n_consumers:
consumer_ids = range(1, n_consumers + 1)
The transaction_created_at field is generated by a Pandas function called date_range. You can view more about this functions in this link:
created_at_list = list(pd.date_range(start=first_transaction_date, end=last_transaction_date, periods=n_rows)
The transaction_payment_value is sample from a normal distribution with mean equals the mean_spend parameter and the stardand deviation equals the std_spend parameter:
list(np.random.normal(transaction_mean_value, transaction_std_value, n_rows))
How to use
You can start the use of RandomDataGen with this example code:
from random_data_gen.data_generator import TransactionalDataGenerator
TRGenerator = TransactionalDataGenerator(
n_rows=1000,
n_consumers=100,
transaction_mean_value=100,
transaction_std_value=10,
first_transaction_date="2020-01-01",
last_transaction_date="2021-01-01",
)
df = TRGenerator()
In this snippet we defined a dataframe with 1000 rows, 100 unique users, a mean spend in transactions of 100u.m., a standard deviation in transactional spend of 10u.m., the first transaction date (2020-01-01) and the last transaction date (2021-01-01).
The dataframe returned is in the form:
| consumer_id | transaction_created_at | transaction_payment_value |
|:-----------:|:-----------------------------:|:-------------------------:|
| 234 | 2020-01-01 00:00:00.000000000 | 120.10 |
| 43 | 2020-01-01 08:47:34.054054054 | 87.10 |
| 321 | 2021-10-23 10:27:12.092356134 | 12.98 |
| 3123 | 2020-12-30 21:37:17.837837840 | 12.84 |
The shape of this dataframe is defined by the parameter n_rows.
Contribute
To contribute you need to install Poetry.
After installing, you need to clone this repo and run the following command:
poetry install -n
Before sending the code to the repo, you need to run:
make format
To apply the project style to the new code.
And after that, run:
make check
This command will check your code with flake8 and pytest.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for random_data_gen-0.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f8175101acf5c62b9d9c3017a7ec7a472d0c27e144479530200c52fa219ebee |
|
MD5 | c019c126bba9c6417da4e215fc1c0c33 |
|
BLAKE2b-256 | be1826ca777f9de9af4c600b5b7034c6f51d23269d84a253804909a04be30662 |