Data engineering & Data science Framework
Project description
py-analytics
This repo contains a python framework with Data Enginner, Data Scientist and 3rd party integration tools capabilities
Installation
py-test-utility can be installed via pip
pip install py-analytics
tdd_utility - module
class load_csv(csv,schema)
Contains methods to extract the equivalent json from csv with nested and repeated records structures
Args
- csv
- path and file name of the csv
- mandatory
- nested fields shall be separated by a dot "." (i.e. item.id, item.quantity)
order | item.id | item.quantity | delivery.address | delivey.postcode |
---|---|---|---|---|
A0001 | item1 | 5 | address1 | e13bp |
item2 | 1 | |||
item3 | 3 | |||
A0002 | item4 | 4 | address4 | e13bp |
item1 | 4 | |||
item3 | 2 |
- schema
- path and schema file name of the table schema
- required if the CSV contain nested and repeated records
- json format i.e.
[
{
"mode": "NULLABLE",
"name": "order",
"type": "STRING"
},
{
"fields": [
{
"mode": "NULLABLE",
"name": "id",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "quantity",
"type": "STRING"
}
],
"mode": "REPEATED",
"name": "item",
"type": "RECORD"
},
{
"fields": [
{
"mode": "NULLABLE",
"name": "address",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "postcode",
"type": "STRING"
}
],
"mode": "NULLABLE",
"name": "delivery",
"type": "RECORD"
}
]
Methods
-
to_json()
- if successfuls return the json extracted from the csv
-
to_new_line_delimiter_file(output_file_name)
- return 0 if successfuls
- create new line delimiter "output_file_name" file
Usage
>>> from data_prep import tdd_utility as tu
>>> mockdata_csv = tu.load_csv(
... csv="path/to/filename/file.csv",
... schema="path/to/schema/schema.json") # initialise the object
>>> mockdata_json = mockdata_csv.to_json() # return the equivalent json
>>> mockdata_json = mockdata_csv.to_new_line_delimiter_file(output="path/output_file_name.json") # return output_file_name
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file py-data-framework-0.0.1.tar.gz
.
File metadata
- Download URL: py-data-framework-0.0.1.tar.gz
- Upload date:
- Size: 2.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.1 requests-toolbelt/0.9.1 tqdm/4.39.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 746712c63127986b7b3f3792c7e880e729a17e59ec38d0f876343b873bf2ea05 |
|
MD5 | 5f2bd6808083eae0f2c1c98ba7596ccc |
|
BLAKE2b-256 | fff282a3ba390f1bf138f9e14c6570470bdb683dd732611fb41aca9c4707435a |
File details
Details for the file py_data_framework-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: py_data_framework-0.0.1-py3-none-any.whl
- Upload date:
- Size: 3.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.1 requests-toolbelt/0.9.1 tqdm/4.39.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 28ff1568a3e9a432e5328aeb54ced028b30c702afda6d66f596ed090d8ad3a44 |
|
MD5 | 5295a8746ea082cf0918b903e2bd043c |
|
BLAKE2b-256 | 4f08de97105a47a00ef77300c903c648dca0861fe9416cf27a6ae4616d1e1cee |