Data engineering & Data science Framework
Project description
py-analytics
This repo contains a python framework with Data Enginner, Data Scientist and 3rd party integration tools capabilities
Installation
py-test-utility can be installed via pip
pip install py-analytics
tdd_utility - module
class load_csv(csv,schema)
Contains methods to extract the equivalent json from csv with nested and repeated records structures
Args
- csv
- path and file name of the csv
- mandatory
- nested fields shall be separated by a dot "." (i.e. item.id, item.quantity)
order | item.id | item.quantity | delivery.address | delivey.postcode |
---|---|---|---|---|
A0001 | item1 | 5 | address1 | e13bp |
item2 | 1 | |||
item3 | 3 | |||
A0002 | item4 | 4 | address4 | e13bp |
item1 | 4 | |||
item3 | 2 |
- schema
- path and schema file name of the table schema
- required if the CSV contain nested and repeated records
- json format i.e.
[
{
"mode": "NULLABLE",
"name": "order",
"type": "STRING"
},
{
"fields": [
{
"mode": "NULLABLE",
"name": "id",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "quantity",
"type": "STRING"
}
],
"mode": "REPEATED",
"name": "item",
"type": "RECORD"
},
{
"fields": [
{
"mode": "NULLABLE",
"name": "address",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "postcode",
"type": "STRING"
}
],
"mode": "NULLABLE",
"name": "delivery",
"type": "RECORD"
}
]
Methods
-
to_json()
- if successfuls return the json extracted from the csv
-
to_new_line_delimiter_file(output_file_name)
- return 0 if successfuls
- create new line delimiter "output_file_name" file
Usage
>>> from data_prep import tdd_utility as tu
>>> mockdata_csv = tu.load_csv(
... csv="path/to/filename/file.csv",
... schema="path/to/schema/schema.json") # initialise the object
>>> mockdata_json = mockdata_csv.to_json() # return the equivalent json
>>> mockdata_json = mockdata_csv.to_new_line_delimiter_file(output="path/output_file_name.json") # return output_file_name
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for py_data_framework-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 28ff1568a3e9a432e5328aeb54ced028b30c702afda6d66f596ed090d8ad3a44 |
|
MD5 | 5295a8746ea082cf0918b903e2bd043c |
|
BLAKE2b-256 | 4f08de97105a47a00ef77300c903c648dca0861fe9416cf27a6ae4616d1e1cee |