Skip to main content

DataWeaver is a Python library for mapping data and transform object to an other. It offers flexible field mapping through a simple configuration object, enabling easy data integration and transformation for analysis and processing.

Project description

DataWeaver

A brief description of what this project does and who it's for. This project is an asynchronous data processing library designed to transform and process data entries efficiently, with a special focus on handling complex data structures.

Features

  • Asynchronous data processing for improved performance.
  • Configuration-based processing for flexible data handling.
  • Support for JSON and YAML configurations.
  • File operations with aiofiles for non-blocking file I/O.
  • Advanced mapping capabilities, allowing for complex key transformations involving nested objects and arrays.

Advanced Mapping Capabilities

One of the standout features of this library is its ability to handle complex keys for data transformations. This allows for precise control over how nested data structures are transformed and outputted. Here's how it works:

  • Dot Notation for Nested Objects: If you want to access data within nested objects, you can use a dot (.) in the key. For example, parent.child will access the child key within a parent object.
  • Digits for Array Indices: When a nested key is a digit, the library interprets it as an array index. For example, parent.0.child accesses the child key of the first object in an array located at the parent key.
  • Automatically Creating Arrays: If the transformation requires placing items into an array based on their keys, the library will automatically create and manage these arrays for you. This is particularly useful when dealing with dynamic data structures.

Installation

Use the package manager pip to install data-weaver.

pip install DataWeaver

Usage

This document provides detailed documentation for two asynchronous functions used for processing data entries based on a given configuration.

weave_entry Function

The weave_entry function asynchronously processes a single entry of data based on a given configuration, optionally saving the processed result to a file. This function is designed for handling individual data entries.

Parameters

  • data (dict): The input data to be processed. This should be a dictionary representing a single entry.
  • config (dict): The configuration settings used for processing the data. This dictionary should contain all necessary parameters and settings required by the load_config and process_entry functions.
  • *args: Variable length argument list. Allows for additional arguments to be passed, which might be required by future extensions or modifications without changing the function signature.
  • **kwargs: Arbitrary keyword arguments. This function looks for a specific keyword argument:
    • file_path (str, optional): If provided and is a string, the function will save the processed data to the specified file path using save_result_to_file. If you don"t provide an extention to the file, by default it will register as json, supported extentions are json, csv, yml and yaml

Returns

  • dict: A dictionary containing the processed data based on the input and the configuration.

Example Usage

from data_weaver import weave_entry
result = await weave_entry(data, config, file_path="path/to/save/result.json")

weave_entries Function

The weave_entries function asynchronously processes a list of data entries based on a given configuration, optionally saving the processed results to a file. This function is designed for handling multiple data entries in bulk.

Parameters

  • data (list[dict]): A list of dictionaries, where each dictionary represents an input data entry to be processed.
  • config (dict): The configuration settings used for processing the data entries. This dictionary should contain all necessary parameters and settings required by the load_config and process_entries functions.
  • *args: Variable length argument list. Allows for additional arguments to be passed, which might be required by future extensions or modifications without changing the function signature.
  • **kwargs: Arbitrary keyword arguments. This function looks for a specific keyword argument:
    • file_path (str, optional): If provided and is a string, the function will save the processed data to the specified file path using save_result_to_file. If you don"t provide an extention to the file, by default it will register as json, supported extantion are json, csv, yml and yaml

Returns

  • dict: A dictionary containing the processed data for all entries based on the input list and the configuration.

Example Usage

from data_weaver import weave_entries
results = await weave_entries(data_list, config, file_path="path/to/save/results.json")

There is also two function that you can use to transform your data from utils:

crush Function

The crush function flattens a nested dictionary or list into a flat dictionary with keys representing the paths to each value.

Parameters

  • nested_dict (dict | list): The nested dictionary or list to be flattened.
  • parent_key (str, optional): The base path for keys in the flattened dictionary. Defaults to an empty string.
  • sep (str, optional): The separator used between keys in the flattened dictionary. Defaults to a period (.).

Returns

  • dict: A flat dictionary where each key is a path composed of original keys concatenated by the specified separator, leading to the corresponding value in the nested structure.

Example Usage

from data_weaver import crush

nested = {'a': {'b': {'c': 1, 'd': 2}}, 'e': [3, 4, {'f': 5}]}
flat = crush(nested)
print(flat)
// {'a.b.c': 1, 'a.b.d': 2, 'e.0': 3, 'e.1': 4, 'e.2.f': 5}

construct Function

The construct function reconstructs a nested dictionary or list from a flat dictionary, where each key represents a path to its corresponding value.

Parameters

  • flat_dict (dict): The flat dictionary to be reconstructed. Keys should be paths with parts separated by periods (.), representing the structure of the resulting nested dictionary or list.

Returns

  • The reconstructed nested dictionary or list based on the paths represented by the keys in the input flat dictionary.

Example Usage

from data_weaver import construct
flat = {'a.b.c': 1, 'a.b.d': 2, 'e.0': 3, 'e.1': 4, 'e.2.f': 5}
nested = construct(flat)
print(nested)
// {'a': {'b': {'c': 1, 'd': 2}}, 'e': [3, 4, {'f': 5}]}

Configuration

Define mappings and additional fields required for processing your data in a Dict.

There are three main sections in the configuration file:

  • mapping: Specifies how keys in the input data should be mapped to keys in the output data. The logical here is the following: target_key: source_key.
  • additionalFields: Specifies additional fields that should be added to the output data. The logical here is the following: target_key: value.
  • transforms: Specifies how different fields in the data are transformed using various functions. Each key represents a field or type of fields, and the associated value describes the transformation to be applied to that field.

Here's an example that demonstrates handling complex keys:

Configuration File: Mapping

This section of the configuration file specifies how keys in the input data should be mapped to keys in the output data. Each key represents a target key in the output data, and the associated value represents the source key in the input data.

The target key can be a simple key or a complex key with nested objects and arrays. The source key can also be a simple key or a complex key with nested objects and arrays. A dot (.) is used to represent nested objects, and a digit is used to represent array indices.

    config = {
        'mapping': {
            'person.name': 'fullName',
            'person.details.age': 'age',
            'person.children.0.name': 'firstChildName'
        },
    }

Exemple

With this config, the object below:

{
  "fullName": "John Doe",
  "age": 30,
  "firstChildName": "Alice",
}

Will be transformed to:

{
  "person": {
    "name": "John Doe",
    "children": [
      {
        "name": "Alice"
      },
    ]
  }
}

You can also map the same field to multiple keys:

config = {
  'mapping': {
    'person.details.fullName': 'fullName'
    'person.name': 'fullName'
  }
}

This object

{
  "fullName": "John Doe"
}

Will be transformed to:

{
  "person": {
    "name": "John Doe",
    "details": {
      "fullName": "John Doe"
    }
  }
}

And you can also map multiple fields to the same key:

config = {
  'mapping': {
    'person.name': 'name',
    'person.lastName': 'lastName',
    'person.fullName': ['name', 'lastName'],
  }
}

This object

{
  "name": "John",
  "lastName": "Doe"
}

Will be transformed to:

{
  "person": {
    "name": "John",
    "lastName": "Doe",
    "fullName": ["John", "Doe"]
  }
}

Configuration File: Additional Fields

This section of the configuration file specifies additional fields that should be added to the output data. Each key represents a target key in the output data, and the associated value represents the value to be assigned to that key. It follow the same logic as the mapping section with dot an numbers notation. But now the value that is passed to the key is not a field but the value you want to assign.

{
  "mapping": {
    "person.name": "fullName",
    "person.details.age": "age",
  },
  "additionalFields": {
      "type": "employee",
  }
}

The object below:

{
  "fullName": "John Doe",
  "age": 30,
}

Will be transformed to:

{
  "person": {
    "name": "John Doe",
    "details": {
      "age": 30
    }
  },
  "type": "employee"
}

Configuration File: Transforms

This section of the configuration file specifies how different fields in the data are transformed using various functions. Each key represents a field or type of fields, and the associated value describes the transformation to be applied to that field. These transformations can include formatting, concatenation, type conversion, and more.

Function Descriptions

1. Text Case Functions:

  • capitalize: Converts the first character of the string to uppercase and the rest to lowercase. Example:

    {
      "mapping": {
        "person.name": "fullName",
      },
      "transforms": {
        "person.name": "capitalize",
      }
    }
    

    The object below:

    {
      "fullName": "john doe",
    }
    

    Will be transformed to:

    {
      "fullName": "John doe",
    }
    
  • lower: Converts all characters in the string to lowercase. Example:

    {
      "mapping": {
        "person.name": "fullName",
      },
      "transforms": {
        "person.name": "lower",
      }
    }
    

    The object below:

    {
      "fullName": "JOHN DOE",
    }
    

    Will be transformed to:

    {
      "fullName": "john doe",
    }
    
  • upper: Converts all characters in the string to uppercase. Example:

    {
      "mapping": {
        "person.name": "fullName",
      },
      "transforms": {
        "person.name": "upper",
      }
    }
    

    The object below:

    {
      "fullName": "john doe",
    }
    

    Will be transformed to:

    {
      "fullName": "JOHN DOE",
    }
    
  • title: Converts the first character of each word to uppercase and the remaining characters of each word to lowercase. Example:

    {
      "mapping": {
        "person.name": "fullName",
      },
      "transforms": {
        "person.name": "title",
      }
    }
    

    The object below:

    {
      "fullName": "john doe",
    }
    

    Will be transformed to:

    {
      "fullName": "John Doe",
    }
    

2. String Manipulation Functions:

  • concat(delimiter=' '): Concatenates list elements into a single string with elements separated by the specified delimiter. Default is a space.

    {
    "mapping": {
      "person.fullName": ["firstName", "lastName"],
    },
    "transforms": {
      "person.fullName": "concat(delimiter=' ')",
    }
    }
    

    The object below:

    {
      "firstName": "John",
      "lastName": "Doe",
    }
    

    Will be transformed to:

    {
      "person": {
        "fullName": "John Doe",
      }
    }
    
  • prefix(string='prefix-'): Prepends the specified string to the beginning of the target string. Default prefix is "prefix-". Example:

    {
      "mapping": {
        "person.name": "fullName",
      },
      "transforms": {
        "person.name": "prefix(string='hello-')",
      }
    }
    

    The object below:

    {
      "fullName": "world",
    }
    

    Will be transformed to:

    {
      "person": {
        "name": "hello-world",
      }
    }
    
  • suffix(string='-suffix'): Appends the specified string to the end of the target string. Default suffix is "-suffix". Example:

    {
      "mapping": {
        "person.name": "fullName",
      },
      "transforms": {
        "person.name": "suffix(string='-world')",
      }
    }
    

    The object below:

    {
      "fullName": "hello",
    }
    

    Will be transformed to:

    {
      "person": {
        "name": "hello-world",
      }
    }
    
  • split(delimiter=' '): Splits the string into a list of substrings around the specified delimiter. Default is a space. Example:

    {
      "mapping": {
        "person.fullName": "fullName",
      },
      "transforms": {
        "person.fullName": "split(delimiter=' ')",
      }
    }
    

    The object below:

    {
      "fullName": "John Doe",
    }
    

    Will be transformed to:

    {
      "person": {
        "fullName": ["John", "Doe"],
      }
    }
    
  • join(delimiter=' '): Joins elements of a list into a single string with elements separated by the specified delimiter. Default is a space.

    {
    "mapping": {
      "person.fullName": ["firstName", "lastName"],
    },
    "transforms": {
      "person.fullName": "join(delimiter=' ')",
    }
    }
    

    The object below:

    {
      "firstName": "John",
      "lastName": "Doe",
    }
    

    Will be transformed to:

    {
      "person": {
        "fullName": "John Doe",
      }
    }
    

3. Replacement and Pattern Matching Functions:

  • replace(old, new): Replaces occurrences of a substring (old) within the string with another substring (new). Options must specify old and new. Example:

    {
      "mapping": {
        "person.name": "fullName",
      },
      "transforms": {
        "person.name": "replace(old='world', new='hello')",
      }
    }
    

    The object below:

    {
      "fullName": "world",
    }
    

    Will be transformed to:

    {
      "person": {
        "name": "hello",
      }
    }
    
  • regex(pattern, replace): Applies a regular expression pattern to the string and replaces matches with the specified replacement string. Options must specify both pattern and replace. Example:

    {
      "mapping": {
        "person.name": "fullName",
      },
      "transforms": {
        "person.name": "regex(pattern='[a-z]+', replace='X')",
      }
    }
    

    The object below:

    {
      "fullName": "world",
    }
    

    Will be transformed to:

    {
      "person": {
        "name": "X",
      }
    }
    

4. Type Parsing Functions:

  • parse_type(typename): Converts the string to the specified type (typename). Valid types include str, bool, int, and float. Example:
    {
      "mapping": {
        "age_mapped": "age",
        "is_student_mapped": "is_student", 
        "is_teacher_mapped": "is_teacher", 
        "salary_mapped": "salary",
        "student_id_mapped": "student_id",
      },
      "transforms": {
        "age_mapped": "parse_type(typename='int')",
        "is_student_mapped": "parse_type(typename='bool')",
        "is_teacher_mapped": "parse_type(typename='bool')",
        "salary_mapped": "parse_type(typename='float')",
        "student_id_mapped": "parse_type(typename='int')",
      }
    }
    
    The object below:
    {
      "age": "30",
      "is_student": "True",
      "is_teacher": "False",
      "salary": "3000.50",
      "student_id": 12345,
    }
    
    Will be transformed to:
    {
      "age_mapped": 30,
      "is_student_mapped": true,
      "is_teacher_mapped": false,
      "salary_mapped": 3000.50,
      "student_id_mapped": "12345",
    }
    

PS : bool type is case insensitive, so you can pass :

  • "true", "True", "TRUE", "yes", "Yes", "YES", "y", "Y" it will be converted to True
  • "no", "No", "NO", "n", "N", "false", "False", "FALSE" it will be converted to False

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataweaver-1.0.3.tar.gz (10.4 kB view hashes)

Uploaded Source

Built Distribution

DataWeaver-1.0.3-py3-none-any.whl (10.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page