Skip to main content

Data engineering & Data science Framework

Project description

py-analytics

This repo contains a python framework with Data Enginner, Data Scientist and 3rd party integration tools capabilities

Installation

py-test-utility can be installed via pip

pip install py-analytics

tdd_utility - module

class load_csv(csv,schema)

Contains methods to extract the equivalent json from csv with nested and repeated records structures

Args

  • csv
    • path and file name of the csv
    • mandatory
    • nested fields shall be separated by a dot "." (i.e. item.id, item.quantity)
order item.id item.quantity delivery.address delivey.postcode
A0001 item1 5 address1 e13bp
item2 1
item3 3
A0002 item4 4 address4 e13bp
item1 4
item3 2
  • schema
    • path and schema file name of the table schema
    • required if the CSV contain nested and repeated records
    • json format i.e.
[  
    {
      "mode": "NULLABLE", 
      "name": "order", 
      "type": "STRING"
    },  
    {
      "fields": [
        {
          "mode": "NULLABLE", 
          "name": "id", 
          "type": "STRING"
        },
        {
          "mode": "NULLABLE", 
          "name": "quantity", 
          "type": "STRING"
        }
      ], 
      "mode": "REPEATED", 
      "name": "item", 
      "type": "RECORD"
    }, 
    {
      "fields": [
        {
          "mode": "NULLABLE", 
          "name": "address", 
          "type": "STRING"
        }, 
        {
          "mode": "NULLABLE", 
          "name": "postcode", 
          "type": "STRING"
        }
      ], 
      "mode": "NULLABLE", 
      "name": "delivery", 
      "type": "RECORD"
    }
  ]

Methods

  • to_json()

    • if successfuls return the json extracted from the csv
  • to_new_line_delimiter_file(output_file_name)

    • return 0 if successfuls
    • create new line delimiter "output_file_name" file

Usage

>>> from data_prep import tdd_utility as  tu
>>> mockdata_csv = tu.load_csv(
...     csv="path/to/filename/file.csv", 
...     schema="path/to/schema/schema.json") # initialise the object
>>> mockdata_json = mockdata_csv.to_json() # return the equivalent json
>>> mockdata_json = mockdata_csv.to_new_line_delimiter_file(output="path/output_file_name.json") # return output_file_name

Project details


Release history Release notifications

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for py-data-framework, version 0.0.1
Filename, size File type Python version Upload date Hashes
Filename, size py_data_framework-0.0.1-py3-none-any.whl (3.7 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size py-data-framework-0.0.1.tar.gz (2.5 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page