Command-line fake data generator
Project description
Faker CLI
Faker is an awesome Python library, but I often just want a simple command I can run to generate data in a variety of formats.
With Faker CLI, you can easily generate CSV, JSON, or Parquet data with fields of your choosing.
You can also utilize pre-built templates for common data formats!
Installation
pip install faker-cli
[!TIP] To use Parquet or Delta Lake, use
pip install faker-cli[parquet]
orpip install faker-cli[delta]
Usage
Once installed you should have the fake
command in your path. Run the following see usage / help:
fake --help
By default, fake
will generate a CSV output for you. You just specify the number of rows you want and the column types.
fake -n 10 pyint,user_name,date_this_year
BAM! You've got a CSV file with your data.
pyint,user_name,date_this_year
8649,fward,2023-03-08
3933,zharris,2023-03-20
1469,jasonellis,2023-05-16
3660,heather91,2023-02-10
9160,cameronlopez,2023-05-05
2735,candacemoore,2023-05-12
7240,zachary06,2023-01-23
9778,thomasstacey,2023-05-23
5820,kenneth36,2023-04-26
2856,michael23,2023-01-16
JSON
Wnat a JSON file? Sweet, use -f json
.
fake -n 10 pyint,user_name,date_this_year -f json
{"pyint": 3854, "user_name": "cchavez", "date_this_year": "2023-01-20"}
{"pyint": 2008, "user_name": "vnguyen", "date_this_year": "2023-04-03"}
{"pyint": 1434, "user_name": "karen38", "date_this_year": "2023-03-02"}
{"pyint": 4922, "user_name": "duncanellen", "date_this_year": "2023-04-22"}
{"pyint": 230, "user_name": "tiffany72", "date_this_year": "2023-02-25"}
{"pyint": 7252, "user_name": "maydouglas", "date_this_year": "2023-04-01"}
{"pyint": 2716, "user_name": "sheilaflores", "date_this_year": "2023-03-20"}
{"pyint": 2827, "user_name": "parksandra", "date_this_year": "2023-04-01"}
{"pyint": 3353, "user_name": "melissaatkinson", "date_this_year": "2023-02-10"}
{"pyint": 5306, "user_name": "mark12", "date_this_year": "2023-04-16"}
Column Names
Default column names aren't good enough for you? Fine, use your own.
fake -n 10 pyint,user_name,date_this_year -f json -c id,awesome_name,last_attention_at
{"id": 6048, "awesome_name": "jtran", "last_attention_at": "2023-04-24"}
{"id": 4310, "awesome_name": "stacey99", "last_attention_at": "2023-04-27"}
{"id": 1839, "awesome_name": "jho", "last_attention_at": "2023-03-07"}
{"id": 236, "awesome_name": "melissamassey", "last_attention_at": "2023-04-17"}
{"id": 6599, "awesome_name": "mwells", "last_attention_at": "2023-04-25"}
{"id": 6071, "awesome_name": "wilcoxrick", "last_attention_at": "2023-01-17"}
{"id": 9646, "awesome_name": "michael92", "last_attention_at": "2023-04-22"}
{"id": 6986, "awesome_name": "ballen", "last_attention_at": "2023-01-08"}
{"id": 6892, "awesome_name": "jennifer61", "last_attention_at": "2023-01-03"}
{"id": 1967, "awesome_name": "jmendoza", "last_attention_at": "2023-01-23"}
Providers (beta)
While Faker is a sweet library, we all like options don't we? Mimesis is also awesome and can be quite a bit faster than Faker. 🤫 You can use a different provider by using -p mimesis
.
[!NOTE]
Providers use their own syntax for data types, so you must change out your column names as necessary.
To generate the same dataset above with Mimesis for example:
fake -p mimesis -n 10 "numeric.integer_number(0),person.username,datetime.date(2024)" -f json -c id,awesome_name,last_attention_at
Provider Arguments
Some Faker providers (like pyint
) take arguments. You can also specify those if you like, separated by semi-colons (because some arguments take a comma-separated string :))
fake -n 10 "pyint(1;100),credit_card_number(amex),pystr_format(?#-####)" -f json -c id,credit_card_number,license_plate
[!IMPORTANT] When using arguments with output formats like JSON, it's best to provide column headers as well with
-c
.
And unique values are supported as well.
fake -n 10 "unique.pyint(1;10),unique.name"
Parquet
OK, it had to happen, you can even write Parquet.
Install with the parquet
module: pip install faker-cli[parquet]
fake -n 10 pyint,user_name,date_this_year -f parquet -o sample.parquet
youcanevenwritestraighttos3 🤭
fake -n 10 pyint,user_name,date_this_year -f parquet -o s3://YOUR_BUCKET/data/sample.parquet
Delta Lake
Data can be exported as a delta lake table.
Install with the delta
module: pip install faker-cli[delta]
fake -n 10 pyint,user_name,date_this_year -f deltalake -o sample_data
Templates
The libary includes a couple templates that can be used to generate certain types of fake data easier.
Today, the only templates that exist are for S3 Access and CloudFront logs.
Want to generate 1 MILLION S3 Access logs in ~2 minutes? Now you can. (But I only show 10 below so as not to crash your terminal)
fake -t s3access -n 10
How about CloudFront? Go ahead.
fake -t cloudfront -n 10
Warning: Both of these templates are still being validated - please be cautious!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file faker_cli-0.6.0.tar.gz
.
File metadata
- Download URL: faker_cli-0.6.0.tar.gz
- Upload date:
- Size: 14.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b34e796eaa9c308d50d0184d8f72e8130a75691bdc0a93ca6d392b3236e1edac |
|
MD5 | f9265bac7692914f49885384546425e6 |
|
BLAKE2b-256 | 3bec10f4b418899aeed47e138102aad9f3688ef8d841352b8992aefd71bd7d9c |
File details
Details for the file faker_cli-0.6.0-py3-none-any.whl
.
File metadata
- Download URL: faker_cli-0.6.0-py3-none-any.whl
- Upload date:
- Size: 15.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e63ef38fc918d4f51950945302601480c09f694c9c61431d919f725d5a573c65 |
|
MD5 | 5d05a7acdc1edf87a2edc36208703c0c |
|
BLAKE2b-256 | 43a75a47c3fb267d295618ccb179480fde007c761c75b96facf9f49643ea00ee |