Skip to main content

File Genie is designed to parse various file types and transform them according to provided configuration

Project description

FileGenie SDK

FileGenie SDK is a Python library designed to simplify parsing files from AWS S3 in various formats (e.g., TEXT, CSV, EXCEL, ZIP, XML, PDF) and transforming the data using user-defined functions into desired output formats. By providing file parsing configurations and custom transformation logic, this library effortlessly processes and provide the output as needed.

Features

  • Multi-format Support: Effortlessly parse files in formats such as TEXT, CSV, EXCEL, ZIP, XML, and PDF directly from AWS S3.
  • Flexible Response Types: Generate responses tailored to user needs, including DATAFRAME, JSON, or FILE outputs.
  • Password-Protected Files: Seamlessly parse files secured with passwords.
  • Custom Edge Case Handling: Apply user-defined custom functions to manage specific parsing and transformation needs, including data sanitization, value conversions, or reformatting date fields for consistency. AWS S3 Integration: Fetch files directly from AWS S3 buckets using IAM roles for secure access. Streamlined Configuration: Set up easily with minimal configuration, eliminating the need of writing parser for specific file type.

Installation

Install the SDK using pip:

pip install file_genie

Prerequisites

  • Your application should be deployed on AWS EKS to enable the SDK to utilize AWS S3 credentials.
  • Python: >= '3.6'
  • Pandas: '2.0.0'

Getting Started

  • Define Custom Edge Cases: Let's say you need to sanitize columns (e.g., standardise column values to a common format before applying custom logic) during file parsing, you can define custom functions for the SDK to use.

To implement this:

  • Create an edgeCases folder in your project.
  • Add a file named user_edge_cases.py.
  • Define your custom functions in this file.
  • Reference these functions in the edge_case section of the file_config.
  • The SDK will automatically import and apply these functions during file parsing or transformation.
from edgeCases import user_edge_cases
self.edge_cases = user_edge_cases
  • Define the configuration required for file parsing logic and S3 bucket names
    s3_config: {
        upload_bucket: reconciliation-live
        download_bucket: reconciliation-live
    }
    file_config: {
        "file_source_1": {
            "read_from_s3_func":"read_complete_excel_file",
            "parameters_for_read_s3": None,
            "file_dtype":{
                "Order_Number": str,
                "Added On":str,
                "Added By":str
            },
            "columns_mapping": {
                <!-- "Column Name in file": "Column name required in output" -->
                "Transaction Type": "TransactionType",
                "Cust Name": "CustomerName",
                "Cust ID": "CustomerId",
                "Transaction Amount": "Amount",
                "OrderNumber": "TransactionReference",
                "Reference ID": "CustomerReferenceId",
                "Target Date": "TargetDate",
                "TransactionDate": "TransactionDate",
                "FeeAmount": "ServiceCharge",
                "TaxAmount": "ServiceTax",
                "NetAmount": "NetAmount"
            }
            "edge_case": {
                <!-- edge case function name which you have defined in user_edge_case.py : params required for that function
                there can be different type of params. For eg. - dict, list, str -->
                <!-- In this convert_amount_as_per_currency is the edge case function which you want to apply while transforming the entries and "Amount" is the param to this function where you will apply the currency conversion -->
                "convert_amount_as_per_currency": "Amount"
            }
        },
    }
  • Define a ParsedDataResponseType enum
import enum
class ParsedDataResponseType(enum.Enum):
    DATAFRAME="DATAFRAME"
    FILE="FILE"
    JSON="JSON"
  • Import and initialise the file genie
from file_genie import FileGenie

file_genie = FileGenie(config={s3_config: s3_config, file_config: file_config})
parsed_data = file_genie.parse("s3://your-bucket-name/path/to/your/file.csv", file_source, ParsedDataResponseType.DATAFRAME.value)
//By default SDK will provide response as DATAFRAME

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

file_genie-0.0.2.tar.gz (14.7 kB view details)

Uploaded Source

Built Distribution

file_genie-0.0.2-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file file_genie-0.0.2.tar.gz.

File metadata

  • Download URL: file_genie-0.0.2.tar.gz
  • Upload date:
  • Size: 14.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for file_genie-0.0.2.tar.gz
Algorithm Hash digest
SHA256 f62af0f65e352120e780dc580d71a493f0b2d261c879c0aa3fd38e0b76e886e2
MD5 a430e5e2fd35c926374925393856bfcc
BLAKE2b-256 7038701d7393841d55cf700c6d82e0b90c4ab23bde714c20aeee62d43e37e7b0

See more details on using hashes here.

File details

Details for the file file_genie-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: file_genie-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for file_genie-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7d908659aa898728a40171a4811215aa9487e507fe7b959a509b607c9efe3a00
MD5 e3f471cd92857db866bda014e7f15416
BLAKE2b-256 45bae4c898f482cb2f1381e867790fb8a20821e2fcc2f8286fc9bba42a4bb1aa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page