Skip to main content

Command-line fake data generator

Project description

Faker CLI

Faker is an awesome Python library, but I often just want a simple command I can run to generate data in a variety of formats.

With Faker CLI, you can easily generate CSV, JSON, or Parquet data with fields of your choosing.

You can also utilize pre-built templates for common data formats!

Installation

pip install faker-cli

[!TIP] To use Parquet or Delta Lake, use pip install faker-cli[parquet] or pip install faker-cli[delta]

Usage

Once installed you should have the fake command in your path. Run the following see usage / help:

fake --help

By default, fake will generate a CSV output for you. You just specify the number of rows you want and the column types.

fake -n 10 pyint,user_name,date_this_year

BAM! You've got a CSV file with your data.

pyint,user_name,date_this_year
8649,fward,2023-03-08
3933,zharris,2023-03-20
1469,jasonellis,2023-05-16
3660,heather91,2023-02-10
9160,cameronlopez,2023-05-05
2735,candacemoore,2023-05-12
7240,zachary06,2023-01-23
9778,thomasstacey,2023-05-23
5820,kenneth36,2023-04-26
2856,michael23,2023-01-16

JSON

Wnat a JSON file? Sweet, use -f json.

fake -n 10 pyint,user_name,date_this_year -f json
{"pyint": 3854, "user_name": "cchavez", "date_this_year": "2023-01-20"}
{"pyint": 2008, "user_name": "vnguyen", "date_this_year": "2023-04-03"}
{"pyint": 1434, "user_name": "karen38", "date_this_year": "2023-03-02"}
{"pyint": 4922, "user_name": "duncanellen", "date_this_year": "2023-04-22"}
{"pyint": 230, "user_name": "tiffany72", "date_this_year": "2023-02-25"}
{"pyint": 7252, "user_name": "maydouglas", "date_this_year": "2023-04-01"}
{"pyint": 2716, "user_name": "sheilaflores", "date_this_year": "2023-03-20"}
{"pyint": 2827, "user_name": "parksandra", "date_this_year": "2023-04-01"}
{"pyint": 3353, "user_name": "melissaatkinson", "date_this_year": "2023-02-10"}
{"pyint": 5306, "user_name": "mark12", "date_this_year": "2023-04-16"}

Column Names

Default column names aren't good enough for you? Fine, use your own.

fake -n 10 pyint,user_name,date_this_year -f json -c id,awesome_name,last_attention_at
{"id": 6048, "awesome_name": "jtran", "last_attention_at": "2023-04-24"}
{"id": 4310, "awesome_name": "stacey99", "last_attention_at": "2023-04-27"}
{"id": 1839, "awesome_name": "jho", "last_attention_at": "2023-03-07"}
{"id": 236, "awesome_name": "melissamassey", "last_attention_at": "2023-04-17"}
{"id": 6599, "awesome_name": "mwells", "last_attention_at": "2023-04-25"}
{"id": 6071, "awesome_name": "wilcoxrick", "last_attention_at": "2023-01-17"}
{"id": 9646, "awesome_name": "michael92", "last_attention_at": "2023-04-22"}
{"id": 6986, "awesome_name": "ballen", "last_attention_at": "2023-01-08"}
{"id": 6892, "awesome_name": "jennifer61", "last_attention_at": "2023-01-03"}
{"id": 1967, "awesome_name": "jmendoza", "last_attention_at": "2023-01-23"}

Providers (beta)

While Faker is a sweet library, we all like options don't we? Mimesis is also awesome and can be quite a bit faster than Faker. 🤫 You can use a different provider by using -p mimesis.

[!NOTE]
Providers use their own syntax for data types, so you must change out your column names as necessary.

To generate the same dataset above with Mimesis for example:

fake -p mimesis -n 10 "numeric.integer_number(0),person.username,datetime.date(2024)" -f json -c id,awesome_name,last_attention_at

Provider Arguments

Some Faker providers (like pyint) take arguments. You can also specify those if you like, separated by semi-colons (because some arguments take a comma-separated string :))

fake -n 10 "pyint(1;100),credit_card_number(amex),pystr_format(?#-####)" -f json -c id,credit_card_number,license_plate

[!IMPORTANT] When using arguments with output formats like JSON, it's best to provide column headers as well with -c.

And unique values are supported as well.

fake -n 10 "unique.pyint(1;10),unique.name"

Parquet

OK, it had to happen, you can even write Parquet.

Install with the parquet module: pip install faker-cli[parquet]

fake -n 10 pyint,user_name,date_this_year -f parquet -o sample.parquet

youcanevenwritestraighttos3 🤭

fake -n 10 pyint,user_name,date_this_year -f parquet -o s3://YOUR_BUCKET/data/sample.parquet

Delta Lake

Data can be exported as a delta lake table.

Install with the delta module: pip install faker-cli[delta]

fake -n 10 pyint,user_name,date_this_year -f deltalake -o sample_data

Iceberg

And, of course, Iceberg tables!

Currently supported are writing to a Glue or generic SQL catalog.

fake -n 10 pyint,user_name,date_this_year -f iceberg -C glue://default.iceberg_sample -o s3://YOUR_BUCKET/iceberg-data/

Templates

The libary includes a couple templates that can be used to generate certain types of fake data easier.

Today, the only templates that exist are for S3 Access and CloudFront logs.

Want to generate 1 MILLION S3 Access logs in ~2 minutes? Now you can. (But I only show 10 below so as not to crash your terminal)

fake -t s3access -n 10

How about CloudFront? Go ahead.

fake -t cloudfront -n 10

Warning: Both of these templates are still being validated - please be cautious!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

faker_cli-0.7.0.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

faker_cli-0.7.0-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file faker_cli-0.7.0.tar.gz.

File metadata

  • Download URL: faker_cli-0.7.0.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for faker_cli-0.7.0.tar.gz
Algorithm Hash digest
SHA256 10c276161e76585d80cdb23f0c2b863f242f05aa90bceed892bbdbb8c1f04404
MD5 d5dadd1798f84b8e76c485f90ed09102
BLAKE2b-256 5080a4f219c0cbfb573b8b56ebb85c4a9e17a773a6e1853d9875c53fd4ad09f6

See more details on using hashes here.

File details

Details for the file faker_cli-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: faker_cli-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for faker_cli-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3ee767ac0afd17e0698ed499f4061c4ad6abb41b1f377d22f149f0dba7753c75
MD5 6d00836990b7e2e8bba520032a74da77
BLAKE2b-256 c05b52d8e360a5a935a5849f75b780cc6a8313e88672610300ecc074a7d6ae6a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page