A simple way to use Dataset. for dsm

These details have not been verified by PyPI

Project links

Homepage

Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Project description

DSM Library

DataNode

init DataNode

from dsmlibrary.datanode import DataNode 

data = DataNode(
  token="<token>",
  apikey="<apikey>",
  dataplatform_api_uri="<dataplatform_api_uri>", 
  object_storage_uri="<object_storage_uri>",
  use_env=<True/False (default(True))>
)

upload file

data.upload_file(directory_id=<directory_id>, file_path='<file_path>', description="<description(optional)>")

download file

data.download_file(file_id=<file_id>, download_path="<place download file save> (default ./dsm.tmp)")

get file

meta, file = data.get_file(file_id="<file_id>")
# meta -> dict
# file -> io bytes

# example read csv pandas
 
meta, file = data.get_file(file_id="<file_id>")
df = pd.read_csv(file)
...

read df

df = data.read_df(file_id="<file_id>")
# df return as pandas dataframe

read ddf

.parquet must use this function

ddf = data.read_ddf(file_id="<file_id>")
# ddf return as dask dataframe

write parquet file

df = ... # pandas dataframe or dask dataframe

data.write(df=df, directory=<directory_id>, name="<save_file_name>", description="<description>", replace=<replace if file exists. default False>, profiling=<True or False default False>, lineage=<list of file id. eg [1,2,3]>)

writeListDataNode

df = ... # pandas dataframe or dask dataframe
data.writeListDataNode(df=df, directory_id=<directory_id>, name="<save_file_name>", description="<description>", replace=<replace if file exists. default False>, profiling=<True or False default False>, lineage=<list of file id. eg [1,2,3]>)

get file id

file_id = data.get_file_id(name=<file name>, directory_id=<directory id>)
# file_id return int fileID

get directory id

directory_id = data.get_directory_id(parent_dir_id=<directory id>, name=<file name>)
# directory_id return int directoryID

get get_file_version use for listDataNode

fileVersion = data.get_file_version(file_id=<file id>)
# return dict `file_id` and `timestamp`

Clickhouse

imoprt data to clickhouse

from dsmlibrary.clickhouse import ClickHouse

ddf = ... # pandas dataframe or dask dataframe

## to warehouse
table_name = <your_table_name>
partition_by = <your_partition_by>

connection = { 
  'host': '', 
  'port': , 
  'database': '', 
  'user': '', 
  'password': '', 
  'settings':{ 
     'use_numpy': True 
  }, 
  'secure': False 
}

warehouse = ClickHouse(connection=connection)

tableName = warehouse.get_or_createTable(ddf=ddf, tableName=table_name, partition_by=partition_by)
warehouse.write(ddf=ddf, tableName=tableName)

query data from clickhouse

query = f""" 
    SELECT * FROM {tableName} LIMIT 10 
""" 
warehouse.read(sqlQuery=query)

drop table

warehouse.dropTable(tableName=table_name)

optional use for custom config insert data to clickhouse

config = {
  'n_partition_per_block': 10,
  'n_row_per_loop': 1000
}
warehouse = ClickHouse(connection=connection, config=config)

truncate table

warehouse.truncateTable(tableName=table_name)

API

dsmlibrary

dsmlibrary.datanode.DataNode

upload_file
download_file
read_df
read_ddf
write
get_file_id

dsmlibrary.clickhouse.ClickHouse

get_or_createTable
write
read
dropTable

Use for pipeline

data = DataNode(apikey="<APIKEY>")

use api key for authenticate

MDM

semantic similarity

pip install "dsmlibrary[mdm]"

see example here

Gendatadict PDF

from dsmlibrary.datadict import GenerateDatadict
gd = GenerateDatadict(
  token="<token>",
  apikey="<apikey>",
  dataplatform_api_uri="<dataplatform_api_uri>", 
  object_storage_uri="<object_storage_uri>"
)
gd.generate_datadict(name="<NAME>", directory_id=<DIR_ID for datadict file>, file_ids=[<FILE_ID>, <FILE_ID>, ...])

use token or apikey

Project details

These details have not been verified by PyPI

Project links

Homepage

Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

1.0.54

May 7, 2024

1.0.53

Jan 8, 2024

1.0.52

Oct 26, 2023

1.0.51

Oct 18, 2023

This version

1.0.50

Oct 18, 2023

1.0.49

Oct 5, 2023

1.0.48

Sep 26, 2023

1.0.47

Sep 25, 2023

1.0.46

Aug 9, 2023

1.0.45

Jul 26, 2023

1.0.44

Jul 13, 2023

1.0.43

May 24, 2023

1.0.42

May 9, 2023

1.0.41

May 8, 2023

1.0.40

Mar 23, 2023

1.0.39

Mar 17, 2023

1.0.38

Mar 17, 2023

1.0.37

Mar 16, 2023

1.0.36

Mar 6, 2023

1.0.35

Mar 3, 2023

1.0.34

Mar 2, 2023

1.0.33

Feb 20, 2023

1.0.32

Feb 17, 2023

1.0.31

Feb 17, 2023

1.0.30

Feb 3, 2023

1.0.29

Jan 5, 2023

1.0.28

Dec 26, 2022

1.0.27

Dec 11, 2022

1.0.26

Dec 11, 2022

1.0.25

Dec 2, 2022

1.0.24

Nov 18, 2022

1.0.23

Nov 17, 2022

1.0.22

Nov 15, 2022

1.0.21

Nov 14, 2022

1.0.20

Nov 10, 2022

1.0.19

Nov 8, 2022

1.0.18

Nov 8, 2022

1.0.17

Oct 13, 2022

1.0.16

Sep 21, 2022

1.0.15

Sep 19, 2022

1.0.14

Sep 19, 2022

1.0.13

Sep 8, 2022

1.0.12

Sep 5, 2022

1.0.11

Aug 30, 2022

1.0.10

Aug 25, 2022

1.0.9

Aug 10, 2022

1.0.8

Aug 3, 2022

1.0.7

Jul 6, 2022

1.0.6

Jul 4, 2022

1.0.5

Jun 17, 2022

1.0.4

Jun 16, 2022

1.0.3

Jun 16, 2022

1.0.2

Jun 16, 2022

1.0.1

Jun 14, 2022

1.0

Jun 13, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dsmlibrary-1.0.50.tar.gz (18.2 kB view details)

Uploaded Oct 18, 2023 Source

File details

Details for the file dsmlibrary-1.0.50.tar.gz.

File metadata

Download URL: dsmlibrary-1.0.50.tar.gz
Upload date: Oct 18, 2023
Size: 18.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.4

File hashes

Hashes for dsmlibrary-1.0.50.tar.gz
Algorithm	Hash digest
SHA256	`c9aca174017acbfadfc94ee2f0a48927291a8491c35a63b685ad187c260f9e4b`
MD5	`9bd9947ebcfa95c8c764607b62e39acf`
BLAKE2b-256	`e77ca7ca5ccdc08c95d735f019bf9cfc2e701b4b0ce3508ffd0d2fa24dbe66f9`

See more details on using hashes here.

dsmlibrary 1.0.50

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DSM Library

DataNode

Clickhouse

API

dsmlibrary

dsmlibrary.datanode.DataNode

dsmlibrary.clickhouse.ClickHouse

Use for pipeline

MDM

Gendatadict PDF

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes