file-parser-sdk·PyPI

File Parser SDK which is designed to parse various file types and transform them according to provided configuration

Project description

File Parser SDK

File Parser SDK is a Python library designed to simplify the parsing of various file formats (eg. TEXT, CSV, EXCEL, ZIP, XML, PDF) with a customizable transforming payloads as required. This SDK offers seamless integration, efficient file handling, and the flexibility to address edge cases with user-defined logic tailored to transforming entries as needed.

Features

Multi-format Support: Parse TEXT, CSV, EXCEL, ZIP, XML and PDF files effortlessly from AWS S3.
Multi-format Response: Supports multiple type of response as per user's need. For eg.- DATAFRAME, JSON, FILE
Password-Proctected Support: Parse password protected files.
Customizable Edge Case Handling: Define and apply custom functions to handle specific parsing requirements. There can be multiple edge case to handle while transforming the entries such as sanitise_str_column, convert_amount_as_per_currency, convert_date_format etc.
S3 Integration: Supports fetching files directly from AWS S3 buckets based on IAM role.
Simple Configuration: Initialize with straightforward configurations, avoiding the need for additional setup files.

Installation

Install the SDK using pip:

pip install file_parser_sdk

Prerequisites

Your application should be deployed on AWS EKS to enable the SDK to utilize AWS S3 credentials.
Python: >= '3.6'
Pandas: '2.0.0'

Getting Started

Define Custom Edge Cases: When specific functions are needed during file parsing, the SDK will import edge cases from your project structure as shown below. To implement this, create an edgeCases folder in your project and add a file named user_edge_cases.py. Define your custom functions in this file, and reference them in the edge_case section within the file_config as shown below.

from edgeCases import user_edge_cases
self.edge_cases = user_edge_cases

Define the configuration required for file parsing logic and S3 bucket names

    s3_config: {
        upload_bucket: reconciliation-live
        download_bucket: reconciliation-live
    }
    file_config: {
        "file_source_1": {
            "read_from_s3_func":"read_complete_excel_file",
            "parameters_for_read_s3": None,
            "file_dtype":{
                "Order_Number": str,
                "Added On":str,
                "Added By":str
            },
            "columns_mapping": {
                <!-- "Column Name in file": "Column name required in output" -->
                "Transaction Type": "TransactionType",
                "Cust Name": "CustomerName",
                "Cust ID": "CustomerId",
                "Transaction Amount": "Amount",
                "OrderNumber": "TransactionReference",
                "Reference ID": "CustomerReferenceId",
                "Target Date": "TargetDate",
                "TransactionDate": "TransactionDate",
                "FeeAmount": "ServiceCharge",
                "TaxAmount": "ServiceTax",
                "NetAmount": "NetAmount"
            }
            "edge_case": {
                <!-- edge case function name which you have defined in user_edge_case.py : params required for that function
                there can be different type of params. For eg. - dict, list, str -->
                <!-- In this convert_amount_as_per_currency is the edge case function which you want to apply while transforming the entries and "Amount" is the param to this function where you will apply the currency conversion -->
                "convert_amount_as_per_currency": "Amount"
            }
        },
    }

Define a ParsedDataResponseType enum

import enum
class ParsedDataResponseType(enum.Enum):
    DATAFRAME="DATAFRAME"
    FILE="FILE"
    JSON="JSON"

Import and initialise the file parser

from file_parser_sdk import FileParserSDK

parser = FileParser(config={s3_config: s3_config, file_config: file_config})
parsed_data = parser.parse("s3://your-bucket-name/path/to/your/file.csv", file_source, ParsedDataResponseType.DATAFRAME.value)
//By default SDK will provide response as DATAFRAME

Project details

Release history Release notifications | RSS feed

This version

0.3.2

Nov 19, 2024

0.3.1

Nov 19, 2024

0.3.0

Nov 19, 2024

0.2.9

Nov 18, 2024

0.2.8

Nov 13, 2024

0.1.8

Nov 13, 2024

0.1.7

Nov 12, 2024

0.1.6

Nov 12, 2024

0.1.5

Nov 12, 2024

0.1.4

Nov 12, 2024

0.1.3

Nov 11, 2024

0.1.2

Nov 11, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

file_parser_sdk-0.3.2.tar.gz (14.5 kB view details)

Uploaded Nov 19, 2024 Source

Built Distribution

file_parser_sdk-0.3.2-py3-none-any.whl (15.6 kB view details)

Uploaded Nov 19, 2024 Python 3

File details

Details for the file file_parser_sdk-0.3.2.tar.gz.

File metadata

Download URL: file_parser_sdk-0.3.2.tar.gz
Upload date: Nov 19, 2024
Size: 14.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for file_parser_sdk-0.3.2.tar.gz
Algorithm	Hash digest
SHA256	`a3331e1a0c682d1988ea59bc7531e0972995134ee64c6e6bec6894935cb6af91`
MD5	`a759994d596940a649d728e3e6b3a338`
BLAKE2b-256	`43255f99d5fe85c8e4bcba34ad6880c5aca6c8a6f2184f6948db1060923622d4`

See more details on using hashes here.

File details

Details for the file file_parser_sdk-0.3.2-py3-none-any.whl.

File metadata

Download URL: file_parser_sdk-0.3.2-py3-none-any.whl
Upload date: Nov 19, 2024
Size: 15.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for file_parser_sdk-0.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b7830e78e8b1a5ca6898f02c37935a028dd22511204a82497df620cf6b16b85a`
MD5	`0d7ff8a324301cb05554198f8eeb74ab`
BLAKE2b-256	`659a77d9d9596c0c6c8e4e0a1b2796f29bbb84767cf699d5d0cf85e07fe2c9d8`

See more details on using hashes here.

file-parser-sdk 0.3.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

File Parser SDK

Features

Installation

Prerequisites

Getting Started

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes