Skip to main content

A simple way to use Dataset. for dsm

Project description

DSM Library

DataNode

  1. init DataNode
from dsmlibrary.datanode import DataNode 

data = DataNode(token)
  1. upload file
data.upload_file(directory_id=<directory_id>, file_path='<file_path>', description="<description(optional)>")
  1. download file
data.download_file(file_id=<file_id>, download_path="<place download file save> (default ./dsm.tmp)")
  1. get file
meta, file = get_file(file_id="<file_id>")
# meta -> dict
# file -> io bytes
# example read csv pandas
 
meta, file = get_file(file_id="<file_id>")
df = pd.read_csv(file)
...
  1. write parquet file
df = ... # pandas dataframe or dask dataframe

data.write(df=df, directory=<directory_id>, name=<save_file_name>, profiling=<True or False default False>)

Clickhouse

  1. imoprt data to clickhouse
from dsmlibrary.clickhouse import ClickHouse

ddf = ... # pandas dataframe or dask dataframe

## to warehouse
table_name = <your_table_name>
table_key = <your_table_key>

connection = { 
  'host': '', 
  'port': , 
  'database': '', 
  'user': '', 
  'password': '', 
  'settings':{ 
     'use_numpy': True 
  }, 
  'secure': False 
}

warehouse = ClickHouse(connection=connection)

tableName = warehouse.get_or_createTable(ddf=ddf, tableName=table_name, key=table_key)
warehouse.write(ddf=ddf, tableName=tableName, key=table_key)
  1. query data from clickhouse
query = f""" 
    SELECT * FROM {tableName} LIMIT 10 
""" 
warehouse.read(sqlQuery=query)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dsmlibrary-1.0.14.tar.gz (10.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page