HDL Delta Sharing Utilities
Project description
Prepare Usage
Create Certificates
- Setup a PKI-Certificate Server (Preferably in the same subaccount but it is not a requirement)
- Download Service-key
Run sapcert-cmd
sapcert -p service_keys/xpm-hdlfs-ds.json -d certificates -v 60 -t DAYS <CN>
Creates 2 files
- .key - private key
- .pem - certificate chain
Create HDLFS Instance
Using the Service Manager API.
The subaccount needs a running Service Manager in the subaccount.
For creating and managing HDL instances I have developed a commandline tool hldinst.
Using default service-key file: service_keys/sm-sk.txt
List instances
hdlinstance list
Access Parameter
hdlinst params xpm-mt1
Instance Details
hdlinst details xpm-mt1
Currently the hdlfs-endpoint is not provided by the hdl-instance details but must be build by
.files.hdl., e.g.
40e60329-e854-46e2-89d4-728093fb7576.files.hdl.prod-us30.hanacloud.ondemand.com
The following command adds a config section to the configuration file: $HOME/.hdlfscli.config.json
hdlinst add2config xpm-mt1 -C hdlmt1
To create a new hdl-instance
hdlinst create xpm-hdl-mt1 --CommonName hdlmt1 -c
This uses a template file in folder './configs/hdlfs_template.json'. The argument '-c' is needed for using the subject of the certificate with the name <certificates-folder><CommonName>.pem as the user subject for the HDLFs authentication.
Additional Requirements for HDL Instance management
- Adding API for deleting an HDLFS instance
- Adding an API for managing the user/parameter of HDL instance. Important when havin many hdl-instances to adminster.
Delta Sharing with SAP HDL
Resources:
Overview
The usage of Spark is required for big data that needs a cluster of compute nodes and cheap storage (delta lake). The result of processing big data needs to be HANA tables that can be consumed by DataSphere. In order to share the results with other applications you either need a
- system - to - system integration where the credentials of the target applications are shared or
- "data product"-kind of integration where you expose the data to external users with an external access management like "delta sharing".
Delta Sharing is internally used as a data sharing technology that separates the internals of the data producer from the data consumer and adds further options to govern the access. For each HANA Cloud, Data Lake instance a Delta Sharing server can be activated. For the time being this is only enabled for internal use. Once we have thoroughly tested this API by productive usage within SAP and if this service can be offered to the main Hyperscaler we might consider opening this service for SAP customer.
URLs
Catalog API
.files.hdl./catalog/v2 .files.hdl./policies/v2
Audience in JWT
.files.
Delta Sharing
- Token Access: .sharing.hdl./shares/v1/
- Cert Access: .files.hdl./shares/v1/
Overview HDL Tools
Currently the only way to manage Delta Sharing are mainly using RestAPIs. For testing purpose how to best use the Delta Sharing management I have developed a command line tools.
hdlfscli
For manageing the files on HDLfs the Command line tool hdlfscli used. This can be downloaded from git bigdataservices/hdlfs-cli.
hdlfscli -h
hdlfscli manages interaction with HDLFiles cloud service.
Find more information at: https://help.sap.com/docs/hana-cloud-data-lake/client-interfaces/hdlfscli-data-lake-files-utility
Storage Commands (Cloud Storage):
ls or list List a file or directory
lsr or tree List a file or directory with recursive traversal
upload Upload a file or directory to remote file storage from local file storage, creating remote directories suitably
rm or delete Delete a file or directory. Use with the flag '-f' for directories e.g., 'rm -f <directory>' to delete a directory
mv Move a remote file from remote source path to remote destination path
cat Open a file
download Copy a file or directory from remote storage to local file storage
JWT Commands (Manage JSON Web Tokens):
jwt Manage JWT
Usage For Storage Commands:
hdlfscli [storage-options] storage-command [arguments]
Usage For JWT Commands:
hdlfscli jwt [jwt-command] [jwt-options]
Use "hdlfscli storage-options" for a list of global command-line options (applies to all storage commands).
Use "hdlfscli jwt-options" for a list of jwt command-line options (applies to all jwt commands).
This hdlfscli uses a configuration file at $HOME/.hdlfscli.config.json. This config is also been used for the commandline apps "hdlshare" and "hdlpolicy".
For convenience reasons I have created a
- 'default' - config-section in .hdlfscli.config.json. This is used when no config is given but required
- alias with hdl='hdlfscli -config'
Example
% hdl default ls data/deltalake
DIRECTORY nobody nogroup 0 0 FR
DIRECTORY nobody nogroup 0 0 DE
DIRECTORY nobody nogroup 0 0 US
% hdl default upload data/US/customer data/deltalake/US/customer
Setup HDL
- sapcert - create signed certificates
- hdlinstance - create a HDLinstance using the created instance
HDL Delta Sharing management
- hdlshare - manage HDL Delta Shares
- hdlpolicy - manage HDL Delta Share policys
- dsclient - Delta Sharing Client to test HDL Delta Sharing
Installation
- Clone the git repository
python -m build
pip install .
- Install via pip (not yet)
pip install hdlshare
HDL Share
Creates and manages shares. The command line app is using the RestAPI (Swagger for hdlfs-service)
delta_sharing_python_client % hdlshare -h
usage: Manage HDLFS shares [-h] [-r] [-m] [-C] [-p PATH] [-c CONFIG] {list,add,delete,get} [target ...]
positional arguments:
{list,add,delete,get}
Command for 'target'-argument
target share schema table (optional)
options:
-h, --help show this help message and exit
-r, --recursive List recursively
-m, --metadata Show metadata of table (action=list)
-C, --cascade Drop cascade when deleting share (action=delete)
-p PATH, --path PATH HDLFS data folder
-c CONFIG, --config CONFIG
HDLFs config in '.hdlfscli.config.json'
Examples
List data on hdlfs:
% hdl default ls data/deltalake/US
DIRECTORY nobody nogroup 0 0 persons
List all shares and tables
% hdlshare list -r
shares
├── sbm
├── hxm
└── crm
└── us
└── customer
Add new table to share
% hdlshare add hxm us employees --path data/deltalake/persons
Table successfully added: hxm: us.employees
% hdlshare list -r
shares
├── sbm
├── hxm
│ └── us
│ └── employees
└── crm
└── us
└── customer
Details of Share:schema:table:
% hdlshare list -rm
shares
├── sbm
├── hxm
│ └── us
│ └── employees
│ ├── data/deltalake/persons
│ ├── DELTA
│ └── cdf: True
└── crm
└── us
└── customer
├── data/deltalake/US/customer
├── DELTA
└── cdf: True
hdlpolicy
Create and manage policies.
% hdlpolicy -h
usage: Manage HDLFS share policies [-h] [-p POLICY] [-s SUBJECT] [-R RESOURCE] [-P PRIVILEGE] [-C CONSTRAINT] [-D DAYS]
[-c CONFIG]
{list,add,delete,copy,token,showtoken} [policy_names ...]
positional arguments:
{list,add,delete,copy,token,showtoken}
Action
policy_names Policy name (for 'copy' arg 2 policies)
options:
-h, --help show this help message and exit
-p POLICY, --policy POLICY
Policy content (json)
-s SUBJECT, --subject SUBJECT
subject/user to add or delete from policy and for showing or generating tokens
-R RESOURCE, --resource RESOURCE
Resource to add or delete from policy
-P PRIVILEGE, --privilege PRIVILEGE
Privilege to add or delete from policy
-C CONSTRAINT, --constraint CONSTRAINT
Constraint to add or delete from policy
-D DAYS, --days DAYS Days before expiring from now on.
-c CONFIG, --config CONFIG
HDLFs config in '.hdlfscli.config.json'
Examples
List all policies
% hdlpolicy list
Copy policy
% hdlpolicy copy de_region nl_region
Add resource to policy
% hdlpolicy add nl_region -R share:crm
Delete subject/user to policy
% hdlpolicy delete nl_region -s user:de_admin
Add subject/user to policy
% hdlpolicy add nl_region -s user:hr_nl
Create token for user
% hdlpolicy token -s hr_nl
hdlclient - Delta Sharing Client
hdlclient -h
usage: hdlclient [-h] [-r] [-p PATH] [-m] [-v VERSION] [-e END_VERSION] [-c CONFIG] [-H] [-s]
profile {list,download,metadata} [target ...]
positional arguments:
profile Profile of delta sharing
{list,download,metadata}
Action
target (optional) Target: <share> [<schema>] [<table>]].
options:
-h, --help show this help message and exit
-r, --recursive Sync files with hana
-p PATH, --path PATH Directory to store data.
-m, --meta Download metadata as csn-file to edit before starting the replication.
-v VERSION, --version VERSION
Start version (Warning: overruled by metadata stored version)
-e END_VERSION, --end_version END_VERSION
Version end
-c CONFIG, --config CONFIG
Config-file for HANA access (yaml with url, user, pwd, port)
-H, --Hana Upload to hana
-s, --sync Sync files with hana
Examples
List avaliable shares
% hdlclient admin list -r
shares
├── sbm
├── hxm
│ └── us
│ └── employees
└── crm
└── us
└── customer
% hdlclient hxm_md list
shares
└── hxm
└── us
└── employees
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.