File Genie is designed to parse various file types and transform them according to provided configuration
Project description
FileGenie SDK
FileGenie SDK is a Python library designed to simplify parsing files from AWS S3 in various formats (e.g., TEXT, CSV, EXCEL, ZIP, XML, PDF) and transforming the data using user-defined functions into desired output formats. By providing file parsing configurations and custom transformation logic, this library effortlessly processes and provide the output as needed.
Features
- Multi-format Support: Effortlessly parse files in formats such as TEXT, CSV, EXCEL, ZIP, XML, and PDF directly from AWS S3.
- Flexible Response Types: Generate responses tailored to user needs, including DATAFRAME, JSON, or FILE outputs.
- Password-Protected Files: Seamlessly parse files secured with passwords.
- Custom Edge Case Handling: Apply user-defined custom functions to manage specific parsing and transformation needs, including data sanitization, value conversions, or reformatting date fields for consistency. AWS S3 Integration: Fetch files directly from AWS S3 buckets using IAM roles for secure access. Streamlined Configuration: Set up easily with minimal configuration, eliminating the need of writing parser for specific file type.
Installation
Install the SDK using pip:
pip install file_genie
Prerequisites
- Your application should be deployed on AWS EKS to enable the SDK to utilize AWS S3 credentials.
- Python: >= '3.6'
- Pandas: '2.0.0'
Getting Started
- Define Custom Edge Cases: Let's say you need to sanitize columns (e.g., standardise column values to a common format before applying custom logic) during file parsing, you can define custom functions for the SDK to use.
To implement this:
- Create an edgeCases folder in your project.
- Add a file named user_edge_cases.py.
- Define your custom functions in this file.
- Reference these functions in the edge_case section of the file_config.
- The SDK will automatically import and apply these functions during file parsing or transformation.
from edgeCases import user_edge_cases
self.edge_cases = user_edge_cases
- Define the configuration required for file parsing logic and S3 bucket names
s3_config: {
upload_bucket: reconciliation-live
download_bucket: reconciliation-live
}
file_config: {
"file_source_1": {
"read_from_s3_func":"read_complete_excel_file",
"parameters_for_read_s3": None,
"file_dtype":{
"Order_Number": str,
"Added On":str,
"Added By":str
},
"columns_mapping": {
<!-- "Column Name in file": "Column name required in output" -->
"Transaction Type": "TransactionType",
"Cust Name": "CustomerName",
"Cust ID": "CustomerId",
"Transaction Amount": "Amount",
"OrderNumber": "TransactionReference",
"Reference ID": "CustomerReferenceId",
"Target Date": "TargetDate",
"TransactionDate": "TransactionDate",
"FeeAmount": "ServiceCharge",
"TaxAmount": "ServiceTax",
"NetAmount": "NetAmount"
}
"edge_case": {
<!-- edge case function name which you have defined in user_edge_case.py : params required for that function
there can be different type of params. For eg. - dict, list, str -->
<!-- In this convert_amount_as_per_currency is the edge case function which you want to apply while transforming the entries and "Amount" is the param to this function where you will apply the currency conversion -->
"convert_amount_as_per_currency": "Amount"
}
},
}
- Define a ParsedDataResponseType enum
import enum
class ParsedDataResponseType(enum.Enum):
DATAFRAME="DATAFRAME"
FILE="FILE"
JSON="JSON"
- Import and initialise the file genie
from file_genie import FileGenie
file_genie = FileGenie(config={s3_config: s3_config, file_config: file_config})
parsed_data = file_genie.parse("s3://your-bucket-name/path/to/your/file.csv", file_source, ParsedDataResponseType.DATAFRAME.value)
//By default SDK will provide response as DATAFRAME
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
file_genie-0.0.2.tar.gz
(14.7 kB
view details)
Built Distribution
File details
Details for the file file_genie-0.0.2.tar.gz
.
File metadata
- Download URL: file_genie-0.0.2.tar.gz
- Upload date:
- Size: 14.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f62af0f65e352120e780dc580d71a493f0b2d261c879c0aa3fd38e0b76e886e2 |
|
MD5 | a430e5e2fd35c926374925393856bfcc |
|
BLAKE2b-256 | 7038701d7393841d55cf700c6d82e0b90c4ab23bde714c20aeee62d43e37e7b0 |
File details
Details for the file file_genie-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: file_genie-0.0.2-py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7d908659aa898728a40171a4811215aa9487e507fe7b959a509b607c9efe3a00 |
|
MD5 | e3f471cd92857db866bda014e7f15416 |
|
BLAKE2b-256 | 45bae4c898f482cb2f1381e867790fb8a20821e2fcc2f8286fc9bba42a4bb1aa |