Skip to main content

faker-pyspark is a PySpark DataFrame and Schema provider for the Faker python package

Project description

PySpark provider for Faker

Python package CodeQL

faker-pyspark is a PySpark DataFrame and Schema (StructType) provider for the Faker Python package.

Description

faker-pyspark provides PySpark based fake data for testing purposes. The definition of "fake" in this context really means "random," as the data may look real. However, I make no claims about accuracy, so do not use this as real data!

Installation

Install with pip:

pip install faker-pyspark

Add as a provider to your Faker instance:

from faker import Faker
from faker_pyspark import PySparkProvider
fake = Faker()
fake.add_provider(PySparkProvider)

PySpark DataFrame, Schema and more

>>> df           = fake.pyspark_dataframe()
>>> schema       = fake.pyspark_schema()
>>> df_updated   = fake.pyspark_update_dataframe(df)
>>> column_names = fake.pyspark_column_names()
>>> data         = fake.pyspark_data_dict_using_schema(schema)
>>> data         = fake.pyspark_data_dict()

CLI faker

$ faker pyspark_schema       -i faker_pyspark
$ faker pyspark_dataframe    -i faker_pyspark
$ faker pyspark_schema       -i faker_pyspark
$ faker pyspark_column_names -i faker_pyspark
$ faker pyspark_data_dict    -i faker_pyspark

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

faker_pyspark-0.8.0.tar.gz (3.8 kB view hashes)

Uploaded Source

Built Distribution

faker_pyspark-0.8.0-py3-none-any.whl (4.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page