A Python package to generate fake tabular data. Get data in pandas dataframe or export to Parquet, DeltaLake, Csv, Json, Excel or Sql

These details have not been verified by PyPI

Project links

Project description

Table Faker

🌐 Web Site

screenshoot tablefaker is a versatile Python package that enables effortless generation of realistic yet synthetic table data for various applications. Whether you need test data for software development, this tool simplifies the process with an intuitive schema definition in YAML format.

✨ Key Features

Schema Definition: Define your table schema using a simple YAML file, specifying column names, data types, fake data generation logic, and relationships.
Faker & Randomization: Utilize the Faker library and random data generation to create authentic-looking synthetic data.
Multiple Table Support: Create multiple tables with different schemas and data generation logic in a single YAML file. Define relationships between tables for foreign keys and primary keys.
Multiple Output Formats:
- Pandas DataFrame
- SQL insert script
- CSV
- Parquet
- JSON
- Excel
- Delta Lake

📦 Installation

pip install tablefaker

🧾 YAML schema reference

version: 1

config:
  locale: <locale_string>                      # e.g. en_US
  seed: <integer>                              # deterministic seed applied to random, numpy, Faker
  infer_entity_attrs_by_name: <true|false>     # enable `data: auto` name inference
  python_import:
    - <module_name>                            # modules to import (expose submodules via import)
  community_providers:
    - <community_provider_name>                # module.ClassName or package.provider

tables:
  - table_name: <table_name>
    row_count: <integer>
    start_row_id: <integer>
    export_file_count: <integer>
    export_file_row_count: <integer>
    export_file_name: <string>                 # optional: custom name for exported file (without extension)

    columns:
      - column_name: <column_name>              # string (required)
        data: <package>.<function_name>() | <hardcoded_value> | <column_reference> | auto | None
              # Examples of allowed data forms:
              #   <package>.<function_name>()   -> faker or other imported function call
              #   <hardcoded_value>            -> numeric or r"string"
              #   <column_reference>          -> reference another column by name (first_name + " " + last_name)
              #   auto                        -> resolved to copy_from_fk(...) when infer_entity_attrs_by_name is true
              #   None                        -> explicit NULL
              #   multi-line Python block using | (must return a value)
        is_primary_key: <true|false>
        type: string | int32 | int64 | float | boolean # a NumPy dtype object, a pandas ExtensionDtype, or a Python type
        parquet_type: int32 | int64 | string | timestamp[us] | decimal128(10, 2) # optional: explicit Parquet/Arrow type, only used for parquet export
        null_percentage: <float between 0.0 and 1.0>
        description: <string>

# Expression evaluation context
# Available variables inside `data` expressions:
#   fake, random, datetime, date, timedelta, time, timezone, tzinfo, UTC, MINYEAR, MAXYEAR, math, string, row_id
#
# Special helper functions:
#   foreign_key(parent_table, parent_column, distribution="uniform", param=None, parent_attr=None, weights=None, is_unique=False)
#   copy_from_fk(parent_table, foreign_key_column (this table), parent_attr)
#
# Multi-line Python block:
# Use YAML block scalar `|` and include a final `return <value>` statement.

Notes:

Parent tables must be defined before child tables.
Two-phase evaluation resolves columns that reference other columns correctly.
parquet_type is silently ignored for non-parquet exports such as CSV, JSON, Excel, SQL, and Delta Lake.
For a full example, see tests/test_table.yaml.

🧩 Sample Yaml File Minimal

tables:
  - table_name: person
    columns:
      - column_name: id
        data: row_id
      - column_name: first_name
        data: fake.first_name()
      - column_name: last_name
        data: fake.last_name()

🧪 Sample Yaml File Advanced

version: 1
config:
  locale: en_US
  python_import:
    - dateutil
    - faker-education # custom faker provider
  community_providers:
    - faker_education.SchoolProvider # custom faker provider
tables:
  - table_name: person
    row_count: 10
    start_row_id: 101                               # you can set row_id starting point
    export_file_count: 3                           # you can set export file count (dominant to export_file_row_count)
    columns:
      - column_name: id
        data: row_id                                # row_id is a built-in function
        is_primary_key: true                        # define primary key to use as a foreign key
      - column_name: first_name
        data: fake.first_name()                     # faker function
        type: string
      - column_name: last_name
        data: fake.last_name()
        type: string
      - column_name: full_name
        data: first_name + " " + last_name           # use a column to generate a new column
        is_primary_key: true
      - column_name: age
        data: fake.random_int(18, 90)
        type: int32
      - column_name: annual_income
        data: round(random.uniform(40000, 120000), 2)
        parquet_type: decimal128(10, 2)         # controls the parquet file schema only
      - column_name: street_address
        data: fake.street_address()
      - column_name: city
        data: fake.city()
      - column_name: state_abbr
        data: fake.state_abbr()
      - column_name: postcode
        data: fake.postcode()
      - column_name: gender
        data: random.choice(["male", "female"])     # random.choice is a built-in function
        null_percentage: 0.5                        # null_percentage is a built-in function
      - column_name: left_handed
        data: fake.pybool()
      - column_name: today
        data: datetime.today().strftime('%Y-%m-%d') # datetime package is available by default
      - column_name: created_at
        data: datetime.today()
        parquet_type: date32
      - column_name: easter_date
        data: dateutil.easter.easter(2025).strftime('%Y-%m-%d') # python package you need to import in python_import
      - column_name: discount_eligibility           # custom python function
        data: |
          if age < 25 or age > 60:
            return True
          else:
            return False
  - table_name: employee
    row_count: 10
    export_file_row_count: 60                      # you can set export file row count
    columns:
      - column_name: id
        data: row_id
      - column_name: person_id
        data: foreign_key("person", "id")          # get primary key from another table
      - column_name: full_name
        data: foreign_key("person", "full_name")
      - column_name: hire_date
        data: fake.date_between()
        type: string
      - column_name: title
        data: random.choice(["engineer", "senior engineer", "principal engineer", "director", "senior director", "manager", "vice president", "president"])
      - column_name: salary
        data: None #NULL
        type: float
      - column_name: height
        data: r"170 cm" #string
      - column_name: weight
        data: 150 #number
      - column_name: school
        data: fake.school_name() # community provider function
      - column_name: level
        data: fake.school_level() # community provider function

full yml example

⚙️ Configuration: Determinism & Attribute Inference

🎯 Seed (deterministic)

config:
  locale: en_US
  seed: 42  # Optional: for reproducible datasets

Setting config.seed makes runs deterministic: the same seed and same YAML produce identical outputs.
The seed is applied to Python's random, NumPy (when available), and the Faker instance used by tablefaker.
Use cases: repeatable tests, CI snapshots, and reproducible examples.

🧠 Attribute name inference

config:
  infer_entity_attrs_by_name: true  # Optional: auto-infer FK attributes

When enabled, columns named with the pattern <fkprefix>_<attr> will be automatically bound to the referenced parent row if a sibling <fkprefix>_id exists and is a foreign key.
Example: customer_email will be auto-resolved from the row referenced by customer_id (if customer_id is a FK to customers.customer_id).

🔗 Cross-Table Relationships

📎 Using copy_from_fk()

- column_name: customer_email
  data: copy_from_fk("customers", "customer_id", "email")

copy_from_fk(parent_table, foreign_key_column (this table), parent_attr) copies an attribute from the parent row referenced by the foreign key. foreign_key_column is the column in the current table that is a foreign key to the parent table's primary key. parent_attr is the column in the parent table whose value you want to copy.
Useful when you need to duplicate a value from the parent instead of generating it again.
Parent tables must be defined before child tables in the YAML (no automatic backfilling).

Full parent/child example:

tables:
  - table_name: customers
    row_count: 10
    columns:
      - column_name: id
        is_primary_key: true
        data: row_id
      - column_name: email
        data: fake.email()

  - table_name: orders
    row_count: 50
    columns:
      - column_name: order_id
        data: row_id
        is_primary_key: true
      - column_name: customer_id
        data: foreign_key("customers", "id")
      - column_name: customer_email
        data: copy_from_fk("customers", "customer_id", "email") # second paremeter is the column in this table that is a foreign key to the parent table

⚡ Automatic attribute inference in action

config:
  infer_entity_attrs_by_name: true
tables:
  - table_name: customers
    columns:
      - column_name: customer_id
        is_primary_key: true
        data: row_id
      - column_name: email
        data: fake.email()
  - table_name: orders
    columns:
      - column_name: customer_id
        data: foreign_key("customers", "customer_id")
      - column_name: customer_email
        data: auto  # Automatically resolved from the customer_id FK

data: auto indicates that the value will be inferred by name from the referenced parent row when infer_entity_attrs_by_name is true.

📈 Foreign Key Distributions

Foreign keys support different sampling distributions to model realistic parent usage patterns.

🎲 Uniform distribution (default)

data: foreign_key("customers", "customer_id")

Backward compatible: selects parent keys uniformly at random.

🏔️ Zipf (power-law) distribution

data: foreign_key("customers", "customer_id", distribution="zipf", param=1.2)

Produces head-heavy (long-tail) distributions where a few parents appear much more frequently.
param controls concentration: higher values concentrate more on top-ranked parents.
Useful for modeling popular customers, trending products, or social-systems with power-law behavior.

⚖️ Weighted parent distribution (attribute-based)

data: foreign_key(
  "customers",
  "customer_id",
  distribution="weighted_parent",
  parent_attr="rating",
  weights={"5": 3, "4": 2, "3": 1}
)

Weights are applied based on a parent attribute (here rating) so parents with certain attribute values are preferred.
Any parent attribute value not listed in weights defaults to weight 1.0.
Useful to prefer high-rated customers, VIP tiers, or any attribute-driven bias.

🔒 Unique foreign key (one-to-one relationship)

data: foreign_key("departments", "dept_id", is_unique=True)

When is_unique=True, each parent key value is selected at most once per child table, enforcing a one-to-one relationship.
The child table's row_count must not exceed the parent table's row count; otherwise an error is raised.
Each child table maintains its own independent pool — two different child tables can both use is_unique=True on the same parent without interfering with each other.
Works with all distribution types (uniform, zipf, weighted_parent).

Full example:

tables:
  - table_name: departments
    row_count: 20
    columns:
      - column_name: dept_id
        data: row_id
        is_primary_key: true
      - column_name: dept_name
        data: fake.company()

  - table_name: managers
    row_count: 20
    columns:
      - column_name: manager_id
        data: row_id
        is_primary_key: true
      - column_name: dept_id
        data: foreign_key("departments", "dept_id", is_unique=True)
      - column_name: manager_name
        data: fake.name()

🧩 Complete example (seed, inference, weighted FK)

version: 1
config:
  locale: en_US
  seed: 4242
  infer_entity_attrs_by_name: true

tables:
  - table_name: customers
    row_count: 100
    columns:
      - column_name: customer_id
        data: row_id
        is_primary_key: true
      - column_name: email
        data: fake.unique.email()
      - column_name: rating
        data: random.choice([3, 4, 5])

  - table_name: orders
    row_count: 500
    columns:
      - column_name: order_id
        data: row_id
        is_primary_key: true
      - column_name: customer_id
        data: foreign_key(
          "customers",
          "customer_id",
          distribution="weighted_parent",
          parent_attr="rating",
          weights={"5": 3, "4": 2, "3": 1}
        )
      - column_name: customer_email
        data: auto  # Inferred from customer_id FK

📝 Notes

Parent tables must be defined before child tables (no automatic backfilling/topological sort yet).
Two-phase row evaluation ensures column order within a table does not affect correctness (you can reference other columns freely).
fake.unique behavior is deterministic only when the same Faker instance is reused and config.seed is fixed.
All sampling distributions are deterministic given a fixed seed.

🏗️ Data Generation

You can define your dummy data generation logic in a Python function. The Faker, random and datetime packages are pre-imported and ready to use.

Use the Faker package for realistic data, e.g., fake.first_name() or fake.random_int(1, 10).
Use the random package for basic randomness, e.g., random.choice(["male", "female"]).
Use the datetime package for current date and time, e.g., datetime.today().strftime('%Y-%m-%d').
You can use a column to generate a new column, e.g., first_name + " " + last_name.
Use is_primary_key to define a primary key, e.g., is_primary_key: true.
Use foreign_key to get a primary key from another table, e.g., foreign_key("person", "id"). If you use multiple foreign key functions, you will get the primary key values from the same row.

You can write your logic in a single line or multiple lines, depending on your preference. A built-in function, row_id, provides a unique integer for each row. You can specify row_id starting point using the start_row_id keyword.

In addition, you have control over how your data is exported:

export_file_count: This keyword lets you specify the total number of output files to generate. It's especially useful when you need to split a large dataset into multiple, more manageable files.
export_file_row_count: Use this keyword to set the maximum number of rows that each exported file should contain. This ensures that each file remains within a desired size limit and is easier to handle.

Columns will automatically have the best-fitting data type. However, if you'd like to specify a data type, use the type keyword. You can assign data types using NumPy dtypes, Pandas Extension Dtypes, or Python native types.

If you are exporting to Parquet and need exact physical column types in the final .parquet file, use parquet_type. This uses PyArrow schema control during parquet export.

type controls the pandas DataFrame dtype used during generation.
parquet_type controls the Arrow/Parquet type written to the parquet file.
You can use both on the same column.
If parquet_type is omitted, tablefaker keeps the current behavior and infers the parquet type from the generated data.

Supported parquet_type values include:

int8, int16, int32, int64
uint8, uint16, uint32, uint64
float16, float32, float64, double
string, utf8, large_string
binary, large_binary
bool, boolean
date32, date64
time32[s], time32[ms], time64[us], time64[ns]
timestamp[s], timestamp[ms], timestamp[us], timestamp[ns]
decimal128(precision, scale)

Example:

tables:
  - table_name: transactions
    row_count: 100
    columns:
      - column_name: transaction_id
        data: row_id
        is_primary_key: true
        parquet_type: int32
      - column_name: amount
        data: round(random.uniform(1.0, 9999.99), 2)
        parquet_type: decimal128(10, 2)
      - column_name: created_at
        data: datetime.today()
        parquet_type: timestamp[us]

Here are some examples:

fake.first_name()
fake.random_int(1, 10)
random.choice(["male", "female"])
datetime.today()
911 # number
r"170 cm" # string

📖 Built In Fake Data Generators

Table faker has several build in functions based on Faker and Random packages. You can find these functions in the

Other build in functions

datetime
date
timedelta
time
timezone
tzinfo
UTC
MINYEAR
MAXYEAR
math
string

💻 Example Code

import tablefaker

# exports to current folder in csv format
tablefaker.to_csv("test_table.yaml")

# exports to sql insert into scripts to insert to your database
tablefaker.to_sql("test_table.yaml")

# SQL identifiers are emitted as provided by table_name and column_name
# so you can use database-specific styles like [schema].[dbo].[table_name].
# If your database requires escaping/quoting, provide names in that format.

# exports all tables in json format
tablefaker.to_json("test_table.yaml", "./target_folder")

# exports all tables in parquet format
tablefaker.to_parquet("test_table.yaml", "./target_folder")

# exports all tables in deltalake format
tablefaker.to_deltalake("test_table.yaml", "./target_folder")

# export single table to the provided folder
tablefaker.to_deltalake("test_table.yaml", "./target_folder/person/", table_name="person")

# exports only the first table in excel format
tablefaker.to_excel("test_table.yaml", "./target_folder/target_file.xlsx")

# get as pandas dataframes
df_dict = tablefaker.to_pandas("test_table.yaml")
person_df = df_dict["person"]
print(person_df.head(5))

🖥️ Sample CLI Command

You can use tablefaker in your terminal for ad-hoc needs or in shell scripts to automate fake data generation. The CLI reads the YAML config and supports importing Python modules via config.python_import and adding Faker community providers declared under config.community_providers (see "Custom Faker Providers" below). Custom Python functions (passed via the custom_function parameter) are only supported when using the Python API programmatically.

Supported CLI flags:

--config : path to YAML or JSON config
--file_type : csv,json,parquet,excel,sql,deltalake (default: csv)
--target : target folder or file path
--seed : integer seed to make generation deterministic
--infer-attrs : "true" or "false" to override infer_entity_attrs_by_name

# exports to current folder in csv format (reads community_providers from config)
tablefaker --config tests/test_table.yaml

# exports as sql insert script files
tablefaker --config tests/test_table.yaml --file_type sql --target ./out

# exports to current folder in excel format
tablefaker --config tests/test_table.yaml --file_type excel

# exports all tables in json format to a folder
tablefaker --config tests/test_table.yaml --file_type json --target ./target_folder

# exports a single table to a parquet file
tablefaker --config tests/test_table.yaml --file_type parquet --target ./target_folder/target_file.parquet

# pass an explicit seed and enable attribute inference
tablefaker --config tests/test_table.yaml --seed 42 --infer-attrs true

📄 Sample CSV Output

id,first_name,last_name,age,dob,salary,height,weight
1,John,Smith,35,1992-01-11,,170 cm,150
2,Charles,Shepherd,27,1987-01-02,,170 cm,150
3,Troy,Johnson,42,,170 cm,150
4,Joshua,Hill,86,1985-07-11,,170 cm,150
5,Matthew,Johnson,31,1940-03-31,,170 cm,150

🧾 Sample Sql Output

INSERT INTO employee
(id,person_id,hire_date,title,salary,height,weight,school,level)
VALUES
(1, 4, '2020-10-09', 'principal engineer', NULL, '170 cm', 150, 'ISLIP HIGH SCHOOL', 'level 2'),
(2, 9, '2002-12-20', 'principal engineer', NULL, '170 cm', 150, 'GUY-PERKINS HIGH SCHOOL', 'level 1'),
(3, 2, '1996-01-06', 'principal engineer', NULL, '170 cm', 150, 'SPRINGLAKE-EARTH ELEM/MIDDLE SCHOOL', 'level 3');

🧰 Custom Faker Providers

You can add and use custom / community faker providers with table faker.
Here is a list of these community providers.
https://faker.readthedocs.io/en/master/communityproviders.html#

version: 1
config:
  locale: en_US
tables:
  - table_name: employee
    row_count: 5
    columns:
      - column_name: id
        data: row_id
      - column_name: person_id
        data: fake.random_int(1, 10)
      - column_name: hire_date
        data: fake.date_between()
      - column_name: school
        data: fake.school_name()  # custom provider

import tablefaker

# import the custom faker provider
from faker_education import SchoolProvider

# provide the faker provider class to the tablefaker using fake_provider
# you can add a single provider or a list of providers
tablefaker.to_csv("test_table.yaml", "./target_folder", fake_provider=SchoolProvider)
# this works with all other to_ methods as well.

🧩 Custom Functions

With Table Faker, you have the flexibility to provide your own custom functions to generate column data. This advanced feature empowers developers to create custom fake data generation logic that can pull data from a database, API, file, or any other source as needed.
You can also supply multiple functions in a list, allowing for even more versatility.
The custom function you provide should return a single value, giving you full control over your synthetic data generation.

from tablefaker import tablefaker
from faker import Faker

fake = Faker()
def get_level():
    return f"level {fake.random_int(1, 5)}"

tablefaker.to_csv("test_table.yaml", "./target_folder", custom_function=get_level)

Add get_level function to your yaml file

version: 1
config:
  locale: en_US
tables:
  - table_name: employee
    row_count: 5
    columns:
      - column_name: id
        data: row_id
      - column_name: person_id
        data: fake.random_int(1, 10)
      - column_name: hire_date
        data: fake.date_between()
      - column_name: level
        data: get_level() # custom function

🧬 Generate Yaml File From Avro Schema or Csv

If you have an avro schema, you can generate a yaml file using avro_to_yaml function.

from tablefaker import tablefaker
tablefaker.avro_to_yaml("tests/test_person.avsc", "tests/exports/person.yaml")

And also you can use csv to define your columns and generate the yaml file.

from tablefaker import tablefaker
tablefaker.csv_to_yaml("tests/test_person.csv", "tests/exports/person.yaml")

📄 Sample Csv file

column_name,description,data,type,null_percentage
id,Unique identifier for the person,row_id,,
first_name,First name of the person,fake.first_name(),string,
last_name,Last name of the person,fake.last_name(),string,
age,Age of the person,fake.random_int(),int32,0.1
email,Email address of the person,fake.email(),string,0.1
is_active,Indicates if the person is active,fake.pybool(),boolean,0.2
signup_date,Date when the person signed up,fake.date(),,0.3

🤖 Agent Skills

Agent skills are enable AI agents to generate tablefaker YAML configurations and generate data on the fly. You can use these skills in your AI agents to create synthetic data for testing, development, or any other purpose without manually writing YAML files.

To install tablefaker agent skills, use the following command:

npx skills add necatiarslan/table-faker

❤️ Support & Donation

If you find Table Faker useful and would like to support its development, consider making a donation.

📚 Additional Resources

Faker Functions: Faker Providers
Bug Reports & Feature Requests: GitHub Issues

🗺️ Roadmap

✅ TODO

Support composite primary key
- composite keys are not unique
- composite keys are not stored together
- copy_from_fk do not support composite primary key
add python support to export_file_name
Provide foreign keys (dictionary, array etc) as an external source
Variables
Generate template yaml file from sample data
use an ai service to generate data generation logic
make openpyxl package optional to export to excel

🔮 Future Enhancements

PyArrow table support
Avro file support
Add target file name to YAML

Follow for Updates: LinkedIn
Author: Necati Arslan | Email

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.11.1

May 20, 2026

1.11.0

Apr 22, 2026

1.10.3

Apr 15, 2026

1.10.2

Mar 31, 2026

1.10.1

Mar 27, 2026

1.10.0

Mar 27, 2026

1.9.1

Mar 14, 2026

1.9.0

Mar 12, 2026

1.8.0

Oct 10, 2025

1.7.1

Sep 20, 2025

1.7.0

Sep 18, 2025

1.6.0

Apr 13, 2025

1.5.0

Apr 11, 2025

1.4.4

Mar 14, 2025

1.4.3

Mar 14, 2025

1.4.2

Mar 11, 2025

1.4.1

Mar 11, 2025

1.4.0

Mar 10, 2025

1.3.2

Mar 7, 2025

1.3.1

Mar 7, 2025

1.3.0

Mar 7, 2025

1.2.0

Feb 28, 2025

1.1.0

Dec 7, 2024

1.0.5

Aug 27, 2024

1.0.4

Nov 11, 2023

1.0.3

Nov 10, 2023

1.0.2

Nov 5, 2023

1.0.1

Nov 3, 2023

1.0.0

Nov 2, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tablefaker-1.11.1.tar.gz (63.7 kB view details)

Uploaded May 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tablefaker-1.11.1-py3-none-any.whl (45.4 kB view details)

Uploaded May 20, 2026 Python 3

File details

Details for the file tablefaker-1.11.1.tar.gz.

File metadata

Download URL: tablefaker-1.11.1.tar.gz
Upload date: May 20, 2026
Size: 63.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.0

File hashes

Hashes for tablefaker-1.11.1.tar.gz
Algorithm	Hash digest
SHA256	`479ce98dcca0732eedde927850640f01f2f6491abed7addd320672d5eeae641b`
MD5	`52fd37e855bd09ec7d7a82909b9b1eff`
BLAKE2b-256	`f330d3a748b9a38b75e6adea917d6f050592a0cc03c534c1891120e936575d65`

See more details on using hashes here.

File details

Details for the file tablefaker-1.11.1-py3-none-any.whl.

File metadata

Download URL: tablefaker-1.11.1-py3-none-any.whl
Upload date: May 20, 2026
Size: 45.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.0

File hashes

Hashes for tablefaker-1.11.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b79d455a4ee2addcabb1ff245f652b5fdf31bfe5d145f1d6feb3c5367357d17e`
MD5	`66fa833e876ff9d92a85e286b12ba7c6`
BLAKE2b-256	`51c21fcde72da38fd561c9b48839656941a41a35e863c4e0b65e2047e64053e6`

See more details on using hashes here.

tablefaker 1.11.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Table Faker

✨ Key Features

📦 Installation

🧾 YAML schema reference

🧩 Sample Yaml File Minimal

🧪 Sample Yaml File Advanced

⚙️ Configuration: Determinism & Attribute Inference

🎯 Seed (deterministic)

🧠 Attribute name inference

🔗 Cross-Table Relationships

📎 Using copy_from_fk()

⚡ Automatic attribute inference in action

📈 Foreign Key Distributions

🎲 Uniform distribution (default)

🏔️ Zipf (power-law) distribution

⚖️ Weighted parent distribution (attribute-based)

🔒 Unique foreign key (one-to-one relationship)

🧩 Complete example (seed, inference, weighted FK)

📝 Notes

🏗️ Data Generation

📖 Built In Fake Data Generators

💻 Example Code

🖥️ Sample CLI Command

📄 Sample CSV Output

🧾 Sample Sql Output

🧰 Custom Faker Providers

🧩 Custom Functions

🧬 Generate Yaml File From Avro Schema or Csv

🤖 Agent Skills

❤️ Support & Donation

📚 Additional Resources

🗺️ Roadmap

✅ TODO

🔮 Future Enhancements

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes