Package to generate random transactional data
Project description
RandomDataGen - Random Data Generator Package
This is a package to generate random transactional data. You can use this package to study Pandas operations or clustering methods like RFM.
With this package you can create a table with transactional data containing:
- consumer_id: ID identifying the customer that does the transaction;
- transaction_created_at: Date of transaction;
- transaction_payment_value: Monetary value of transaction.
All the fields are customizable.
How the data is generated
The consumer_id field is generated by a range function, returning a sequence of integers from 1 to n_consumers:
consumer_ids = range(1, n_consumers + 1)
The transaction_created_at field is generated by a Pandas function called date_range. You can view more about this functions in this link:
created_at_list = list(pd.date_range(start=first_transaction_date, end=last_transaction_date, periods=n_rows)
The transaction_payment_value is sample from a normal distribution with mean equals the mean_spend parameter and the stardand deviation equals the std_spend parameter:
list(np.random.normal(transaction_mean_value, transaction_std_value, n_rows))
How to use
You can start the use of RandomDataGen with this example code:
from random_data_gen.data_generator import TransactionalDataGenerator
TRGenerator = TransactionalDataGenerator(
n_rows=1000,
n_consumers=100,
transaction_mean_value=100,
transaction_std_value=10,
first_transaction_date="2020-01-01",
last_transaction_date="2021-01-01",
)
df = TRGenerator()
In this snippet we defined a dataframe with 1000 rows, 100 unique users, a mean spend in transactions of 100u.m., a standard deviation in transactional spend of 10u.m., the first transaction date (2020-01-01) and the last transaction date (2021-01-01).
The dataframe returned is in the form:
| consumer_id | transaction_created_at | transaction_payment_value |
|:-----------:|:-----------------------------:|:-------------------------:|
| 234 | 2020-01-01 00:00:00.000000000 | 120.10 |
| 43 | 2020-01-01 08:47:34.054054054 | 87.10 |
| 321 | 2021-10-23 10:27:12.092356134 | 12.98 |
| 3123 | 2020-12-30 21:37:17.837837840 | 12.84 |
The shape of this dataframe is defined by the parameter n_rows.
Contribute
To contribute you need to install Poetry.
After installing, you need to clone this repo and run the following command:
poetry install -n
Before sending the code to the repo, you need to run:
make format
To apply the project style to the new code.
And after that, run:
make check
This command will check your code with flake8 and pytest.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file random-data-gen-0.1.3.tar.gz
.
File metadata
- Download URL: random-data-gen-0.1.3.tar.gz
- Upload date:
- Size: 3.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.11 CPython/3.8.1 Linux/5.13.0-27-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | eda4adff065a41c8c5216b9c93c84a2843b561b0830500748d6d69023fccf372 |
|
MD5 | 218a53e067d4aa07b049987678e7383d |
|
BLAKE2b-256 | d3d9579f4f0b045445914f855d5d71e5ffd3adf086367eab605425be35919d5e |
File details
Details for the file random_data_gen-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: random_data_gen-0.1.3-py3-none-any.whl
- Upload date:
- Size: 3.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.11 CPython/3.8.1 Linux/5.13.0-27-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f8175101acf5c62b9d9c3017a7ec7a472d0c27e144479530200c52fa219ebee |
|
MD5 | c019c126bba9c6417da4e215fc1c0c33 |
|
BLAKE2b-256 | be1826ca777f9de9af4c600b5b7034c6f51d23269d84a253804909a04be30662 |