It is a library that facilitates converting CSV files to various formats (such as DataFrames or other CSV/Excel files) based on a JSON mapping
Project description
Project Title
DataForgeToolkit: Flexible Data Mapping for CSV/XLSX Files
Description
The DataForgeToolkit is a Python library designed to streamline the process of converting CSV or Excel files into customized DataFrames based on user-defined JSON mapping configurations. Whether you're working with financial reports, customer datasets, or any other structured data, this toolkit empowers you to effortlessly transform raw data into actionable insights.
Features: Versatile File Support: Seamlessly process both CSV and Excel files, providing flexibility in handling various data formats commonly encountered in data analysis tasks.
Customizable Mapping: Define transformation mappings using a JSON file, allowing for precise specification of column names, data cleaning, and value substitutions tailored to your specific data requirements.
Efficient Data Processing: Automate data preprocessing tasks such as handling missing values, standardizing column names, and applying complex value mappings with ease.
Installation Usage/Examples
pip install dataforgetoolkit
Define Transformation Mapping:
Create a JSON file specifying the transformation mappings for your data. Define column mappings, specify new column names, and define value substitutions as needed.
Use the Toolkit:
Import the DataForgeToolkit in your Python script and utilize the map function to convert your report files:
from dataforgetoolkit import datamapper
datamapper.map('report file path csv / xlsx format','mapping json file path')
Access Mapped Data:
Access the transformed data as a DataFrame for further analysis or export to other formats.
Transformation Functions Available
DEFAULT_VALUE = "*"
FILTER_VALUE = "FILTER"
REPLACE_VALUE = "REPLACE_"
CONCAT_VALUE = "CONCAT"
UPPERCASE_VALUE = "UPPERCASE"
LOWERCASE_VALUE = "LOWERCASE"
REGEX_VALUE = "REGEX_"
JSON Transformation Mapping
Transformation mappings are specified using a JSON file. Example:
{ "transformation_mapping": [ { "column": "Name", "new_name": "Student Name", "value_mappings": [ { "*": "Amit Singh" } ] }, { "column": "Age_Column", "new_name": "Age", "value_mappings": [ { "FILTER": "30" } ] }, { "column": "Location", "new_name": "Country", "value_mappings": [ { "REPLACE_usa": "United state of America" } ] }, { "column": "Gender", "new_name": "Sex", "value_mappings": [ { "MALE": "M", "FEMALE": "F" } ] }, { "column": "Zipcode_Column", "new_name": "Processed_Text_regex", "value_mappings": [ { "REPLACE_hello": "hi", "REGEX_[0-9]+": "NUMBER" } ] } ] }
Authors
-
Software Engineer
Contributing
Contributions are always welcome!
Please adhere to this project's code of conduct
.
Suggest code and open PR/MR
Used By
'Intended Audience' :: Developers , Testers , BA
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for dataforgetoolkit-1.0.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 920d0c61eab94ff914b9a42e803a39dbf36245f21af881c0fef4fb8465cea953 |
|
MD5 | 7251631577d4596427e17f76a01969e4 |
|
BLAKE2b-256 | 91bf327418abda1ccbb21bc4bf97a4f72c1c7cec04a9d38b969fbd949bf7d765 |