Skip to main content

It is a library that facilitates converting CSV files to various formats (such as DataFrames or other CSV/Excel files) based on a JSON mapping

Project description

Project Title

DataForgeToolkit: Flexible Data Mapping for CSV/XLSX Files

Description

The DataForgeToolkit is a Python library designed to streamline the process of converting CSV or Excel files into customized DataFrames based on user-defined JSON mapping configurations. Whether you're working with financial reports, customer datasets, or any other structured data, this toolkit empowers you to effortlessly transform raw data into actionable insights.

Features: Versatile File Support: Seamlessly process both CSV and Excel files, providing flexibility in handling various data formats commonly encountered in data analysis tasks.

Customizable Mapping: Define transformation mappings using a JSON file, allowing for precise specification of column names, data cleaning, and value substitutions tailored to your specific data requirements.

Efficient Data Processing: Automate data preprocessing tasks such as handling missing values, standardizing column names, and applying complex value mappings with ease.

Installation Usage/Examples

  pip install dataforgetoolkit

Define Transformation Mapping:

Create a JSON file specifying the transformation mappings for your data. Define column mappings, specify new column names, and define value substitutions as needed.

Use the Toolkit:

Import the DataForgeToolkit in your Python script and utilize the map function to convert your report files:

    from dataforgetoolkit import datamapper
    datamapper.map('report file path csv / xlsx format','mapping json file path')

Access Mapped Data:

Access the transformed data as a DataFrame for further analysis or export to other formats.

Transformation Functions Available

DEFAULT_VALUE = "*"
FILTER_VALUE = "FILTER"
REPLACE_VALUE = "REPLACE_"
CONCAT_VALUE = "CONCAT"
UPPERCASE_VALUE = "UPPERCASE"
LOWERCASE_VALUE = "LOWERCASE"
REGEX_VALUE = "REGEX_"

JSON Transformation Mapping

Transformation mappings are specified using a JSON file. Example:

{ "transformation_mapping": [ { "column": "Name", "new_name": "Student Name", "value_mappings": [ { "*": "Amit Singh" } ] }, { "column": "Age_Column", "new_name": "Age", "value_mappings": [ { "FILTER": "30" } ] }, { "column": "Location", "new_name": "Country", "value_mappings": [ { "REPLACE_usa": "United state of America" } ] }, { "column": "Gender", "new_name": "Sex", "value_mappings": [ { "MALE": "M", "FEMALE": "F" } ] }, { "column": "Zipcode_Column", "new_name": "Processed_Text_regex", "value_mappings": [ { "REPLACE_hello": "hi", "REGEX_[0-9]+": "NUMBER" } ] } ] }

Authors

Contributing

Contributions are always welcome!

Please adhere to this project's code of conduct.

Suggest code and open PR/MR

Used By

'Intended Audience' :: Developers , Testers , BA

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

dataforgetoolkit-1.0.6-py3-none-any.whl (5.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page