Skip to main content

Data engineering & Data science Framework

Project description

py-analytics

This repo contains a python framework with Data Enginner, Data Scientist and 3rd party integration tools capabilities

Installation

py-test-utility can be installed via pip

pip install py-analytics

tdd_utility - module

class load_csv(csv,schema)

Contains methods to extract the equivalent json from csv with nested and repeated records structures

Args

  • csv
    • path and file name of the csv
    • mandatory
    • nested fields shall be separated by a dot "." (i.e. item.id, item.quantity)
order item.id item.quantity delivery.address delivey.postcode
A0001 item1 5 address1 e13bp
item2 1
item3 3
A0002 item4 4 address4 e13bp
item1 4
item3 2
  • schema
    • path and schema file name of the table schema
    • required if the CSV contain nested and repeated records
    • json format i.e.
[  
    {
      "mode": "NULLABLE", 
      "name": "order", 
      "type": "STRING"
    },  
    {
      "fields": [
        {
          "mode": "NULLABLE", 
          "name": "id", 
          "type": "STRING"
        },
        {
          "mode": "NULLABLE", 
          "name": "quantity", 
          "type": "STRING"
        }
      ], 
      "mode": "REPEATED", 
      "name": "item", 
      "type": "RECORD"
    }, 
    {
      "fields": [
        {
          "mode": "NULLABLE", 
          "name": "address", 
          "type": "STRING"
        }, 
        {
          "mode": "NULLABLE", 
          "name": "postcode", 
          "type": "STRING"
        }
      ], 
      "mode": "NULLABLE", 
      "name": "delivery", 
      "type": "RECORD"
    }
  ]

Methods

  • to_json()

    • if successfuls return the json extracted from the csv
  • to_new_line_delimiter_file(output_file_name)

    • return 0 if successfuls
    • create new line delimiter "output_file_name" file

Usage

>>> from data_prep import tdd_utility as  tu
>>> mockdata_csv = tu.load_csv(
...     csv="path/to/filename/file.csv", 
...     schema="path/to/schema/schema.json") # initialise the object
>>> mockdata_json = mockdata_csv.to_json() # return the equivalent json
>>> mockdata_json = mockdata_csv.to_new_line_delimiter_file(output="path/output_file_name.json") # return output_file_name

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py-data-framework-0.0.1.tar.gz (2.5 kB view details)

Uploaded Source

Built Distribution

py_data_framework-0.0.1-py3-none-any.whl (3.7 kB view details)

Uploaded Python 3

File details

Details for the file py-data-framework-0.0.1.tar.gz.

File metadata

  • Download URL: py-data-framework-0.0.1.tar.gz
  • Upload date:
  • Size: 2.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.1 requests-toolbelt/0.9.1 tqdm/4.39.0 CPython/3.7.3

File hashes

Hashes for py-data-framework-0.0.1.tar.gz
Algorithm Hash digest
SHA256 746712c63127986b7b3f3792c7e880e729a17e59ec38d0f876343b873bf2ea05
MD5 5f2bd6808083eae0f2c1c98ba7596ccc
BLAKE2b-256 fff282a3ba390f1bf138f9e14c6570470bdb683dd732611fb41aca9c4707435a

See more details on using hashes here.

File details

Details for the file py_data_framework-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: py_data_framework-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 3.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.1 requests-toolbelt/0.9.1 tqdm/4.39.0 CPython/3.7.3

File hashes

Hashes for py_data_framework-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 28ff1568a3e9a432e5328aeb54ced028b30c702afda6d66f596ed090d8ad3a44
MD5 5295a8746ea082cf0918b903e2bd043c
BLAKE2b-256 4f08de97105a47a00ef77300c903c648dca0861fe9416cf27a6ae4616d1e1cee

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page