A Python package for generating semantically and syntactically correct data for RDBMS.
Project description
Data Generation Tool
Overview
This project provides a data generation tool that creates synthetic data for database tables based on predefined schemas and constraints.
Classes and Functions
DataGenerator
A class to generate synthetic data for database tables based on provided schemas and constraints.
__init__
Initialize the CheckConstraintEvaluator and set up the expression parser.
build_foreign_key_map
Build a mapping of foreign key relationships for quick lookup.
Returns: dict: A mapping where each key is a parent table, and the value is a list of child table relationships.
resolve_table_order
Resolve the order in which tables should be processed based on foreign key dependencies.
Returns: list: Ordered list of table names.
initialize_primary_keys
Initialize primary key counters for each table.
generate_initial_data
Generate initial data for all tables without enforcing constraints.
generate_composite_primary_keys
Generate data for a table with a composite primary key.
Args: table (str): Table name. num_rows (int): Number of rows to generate.
generate_primary_keys
Generate primary key values for a table row.
Args: table (str): Table name. row (dict): Row data.
enforce_constraints
Enforce all constraints on the generated data.
assign_foreign_keys
Assign foreign key values to a table row.
Args: table (str): Table name. row (dict): Row data.
fill_remaining_columns
Fill in the remaining columns of a table row.
enforce_not_null_constraints
Enforce NOT NULL constraints on a table row.
Args: table (str): Table name. row (dict): Row data.
generate_column_value
Generate a value for a column based on predefined values, mappings, and constraints.
Args: table (str): Table name. column (dict): Column schema. row (dict): Current row data. constraints (list): List of constraints relevant to the column.
Returns: Any: Generated value.
generate_value_based_on_type
Generate a value based on the SQL data type.
Args: col_type (str): Column data type.
Returns: Any: Generated value.
is_foreign_key_column
Check if a column is a foreign key in the specified table.
Args: table_p (str): Table name. col_name (str): Column name.
Returns: bool: True if the column is a foreign key, False otherwise.
enforce_unique_constraints
Enforce unique constraints on a table row.
enforce_check_constraints
Enforce CHECK constraints on a table row.
Args: table (str): Table name. row (dict): Row data.
get_column_info
Get the column schema information for a specific column.
Args: table (str): Table name. col_name (str): Column name.
Returns: dict: Column schema.
generate_data
Generate the data by running all steps.
Returns: dict: Generated data with constraints enforced.
export_as_sql_insert_query
Export the generated data as SQL INSERT queries.
Returns: str: A string containing SQL INSERT queries.
repair_data
Iterate through the data and remove any rows that violate constraints, including cascading deletions to maintain referential integrity.
repair_table_data
Repair data for a specific table.
Args: table (str): Table name.
is_row_valid
Check if a row is valid by checking all constraints.
Args: table (str): Table name. row (dict): Row data.
Returns: tuple: (is_valid, violated_constraint) is_valid (bool): True if the row is valid, False otherwise. violated_constraint (str): Description of the violated constraint, or None if valid.
remove_dependent_data
Recursively remove dependent rows in child tables.
Args: table (str): Table name where the row is removed. row (dict): The row data that was removed.
print_statistics
Print statistics about the generated data.
CheckConstraintEvaluator
A class to evaluate SQL CHECK constraints on row data.
_create_expression_parser
Create a parser for SQL expressions used in CHECK constraints.
Returns: pyparsing.ParserElement: The parser for expressions.
extract_columns_from_check
Extract column names from a CHECK constraint expression.
Args: check (str): CHECK constraint expression.
Returns: list: List of column names.
evaluate
Evaluate a CHECK constraint expression.
Args: check_expression (str): CHECK constraint expression. row (dict): Current row data.
Returns: bool: True if the constraint is satisfied, False otherwise.
convert_sql_expr_to_python
Convert a parsed SQL expression into a Python expression.
Args: parsed_expr: The parsed SQL expression. row (dict): Current row data.
Returns: str: The Python expression.
handle_operator
Handle the conversion of parsed expressions containing operators to Python expressions.
Args: parsed_expr: The parsed SQL expression containing operators. row (dict): Current row data.
Returns: str: The converted Python expression.
extract
Simulate SQL EXTRACT function.
Args: field (str): Field to extract (e.g., 'YEAR'). source (datetime.date or datetime.datetime): Date/time source.
Returns: int: Extracted value.
regexp_like
Simulate SQL REGEXP_LIKE function.
Args: value (str): The string to test. pattern (str): The regex pattern.
Returns: bool: True if the value matches the pattern.
like
Simulate SQL LIKE operator using regex.
Args: value (str): The string to match. pattern (str): The pattern, with SQL wildcards.
Returns: bool: True if the value matches the pattern.
not_like
Simulate SQL NOT LIKE operator.
Args: value (str): The string to match. pattern (str): The pattern, with SQL wildcards.
Returns: bool: True if the value does not match the pattern.
identifier_action
None
extract_numeric_ranges
Extract numeric ranges from constraints related to a specific column.
Args: constraints (list): List of constraint expressions. col_name (str): Name of the column to extract ranges for.
Returns: list: A list of tuples representing operators and their corresponding numeric values.
generate_numeric_value
Generate a numeric value based on specified ranges and column type.
Args: ranges (list): A list of tuples representing numeric ranges and their operators. col_type (str): The data type of the column.
Returns: int or float: A randomly generated numeric value within the specified range.
generate_value_matching_regex
Generate a value that matches a specified regex pattern.
Args: pattern (str): The regex pattern to match.
Returns: str: A randomly generated string that matches the given regex pattern.
extract_regex_pattern
Extract regex patterns from constraints related to a specific column.
Args: constraints (list): List of constraint expressions. col_name (str): Name of the column to extract regex patterns for.
Returns: list: A list of regex patterns found in the constraints.
extract_allowed_values
Extract allowed values from constraints related to a specific column.
Args: constraints (list): List of constraint expressions. col_name (str): Name of the column to extract allowed values for.
Returns: list: A list of allowed values specified in the constraints.
parse_create_tables
Parses SQL CREATE TABLE statements and extracts table schema details, including columns, data types, constraints, and foreign keys.
Args: sql_script (str): The SQL script containing CREATE TABLE statements.
Returns: dict: A dictionary where each key is a table name and the value is another dictionary containing columns and foreign keys.
License
This project is licensed under the MIT License.
Contributing
Contributions are welcome! Please open an issue or submit a pull request for any enhancements or bug fixes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file intelligent-data-generator-1.0.2.tar.gz.
File metadata
- Download URL: intelligent-data-generator-1.0.2.tar.gz
- Upload date:
- Size: 19.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd581583c8a8bb7885909424bfc616ca6ff3a31e1e00282da71248635448312f
|
|
| MD5 |
871d45e143c8845c473e8ea4c5d2261c
|
|
| BLAKE2b-256 |
6185b4b5d05a1060a78ca063d0cd1bf4de3666d81f681203f32a4fcf5af0f879
|