A robust and simple library for generating synthetic datasets for ML/DL projects.
Project description
data_genix
A robust and simple library for generating synthetic datasets for machine learning and deep learning projects. Avoid the hassle of downloading and managing data files for testing and prototyping.
Installation
Clone the repository and install using pip:
git clone [https://github.com/yourusername/data_genix.git](https://github.com/yourusername/data_genix.git)
cd data_genix
pip install .
Quick Start
Generate a DataFrame with a variety of data types with a single function call.
from data_genix import DataGenerator
# Initialize the generator
generator = DataGenerator()
# Generate a dataset with 1000 rows
df = generator.generate(
num_rows=1000,
numerical_whole=3,
decimal=2,
categorical=2,
ordinal=1,
boolean=1,
datetime=1,
text=1,
uuid=1,
object_types=['name', 'country', 'email', 'job']
)
print(df.head())
print(df.info())
Features
- Numerical Data: Generate columns of whole numbers (integers) or decimals (floats).
- Categorical Data: Generate columns with a predefined set of unordered categories.
- Ordinal Data: Generate columns with a predefined set of ordered categories.
- Boolean Data: Generate columns of
True/Falsevalues. - Datetime Data: Generate columns with
datetimeobjects. - Text Data: Generate columns with random sentences.
- ID Data: Generate columns with unique identifiers (UUIDs).
- Coordinates: Generate paired latitude and longitude columns.
- Web Data: Generate columns for IP addresses, URLs, and phone numbers.
- Nested Data: Generate columns containing JSON-formatted strings.
- Object/Text Data: Leverage the power of the
Fakerlibrary to generate realistic text data like names, addresses, emails, and much more.
Supported object_types
You can use any standard Faker provider method name as a string in the object_types list. Common examples include:
nameemailaddresscountrycityjobtextdatetimephone_numbercompanyurlcredit_card_number
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datagenix-0.1.0.tar.gz.
File metadata
- Download URL: datagenix-0.1.0.tar.gz
- Upload date:
- Size: 5.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cda42e1eb722ad2636f9a95417f96fa91f95ea75301973ba4da3ed665278f619
|
|
| MD5 |
d8d735b5636eb527754cdd2da8f43f8d
|
|
| BLAKE2b-256 |
91ca87187cb4aebe436e0f428c29c29c23079c54a6c6be07a9eef22c374939cd
|
File details
Details for the file datagenix-0.1.0-py3-none-any.whl.
File metadata
- Download URL: datagenix-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7563f48fae827c411bfad5be6a3ef8f1df47edf148f2b292cbd4248ae6bca357
|
|
| MD5 |
f3aa815fa2770d4869c00ce7aa823e10
|
|
| BLAKE2b-256 |
b97211a5c9da1f966e9170acd03246cf8d0f40c6e8a7e53a2ba86b4eacb5eefe
|