ssh_jump_hive is an tools can jump the jump machine to connect hive get hive data to pandas dataframe
Project description
DSTL
====
https://github.com/mullerhai/sshjumphive
Note: this repo is not supported. License is MIT.
.. contents::
Object types
------------
Note that ssh_jump_hive is an tools can jump the jump machine to connect hive get hive data to pandas dataframe:
- 0: hive_client for simple connect hive server with no jump server
- 1: Jump_Tunnel just for connect hive server with jump server separete
- 2: SSH_Tunnel for get ssh tunnel channel
General approach
----------------
if you want to use it ,you need to know some things
for example these parameters [ jumphost,jumpport,jumpuser,jumppwd,tunnelhost,tunnelAPPport,localhost,localbindport]
for hive server you also need to know params [localhost, hiveusername, hivepassword, localbindport,database, auth]
for query hive data you need to know params [ table, query_fileds_list, partions_param_dict, query_limit]
if your hive server has jump server separete, you need do like this
[
from ssh_jump_hive import Jump_Tunnel_HIVE
import pandas as pd
.....
table = 'tab_client_label'
partions_param_dict = {'client_nmbr': 'AA75', 'batch': 'p1'}
query_fileds_list = ['gid', 'realname', 'card']
querylimit = 1000
jump=Jump_Tunnel(jumphost,jumpport,jumpuser,jumppwd,tunnelhost,tunnelhiveport,localhost,localbindport)
df2=jump.get_JUMP_df(table,partions_param_dict,query_fileds_list,querylimit)
print(df2.shape)
print(df2.head(100))
print(df2.columns())
]
UNet network with batch-normalization added, training with Adam optimizer with
a loss that is a sum of 0.1 cross-entropy and 0.9 dice loss.
Input for UNet was a 116 by 116 pixel patch, output was 64 by 64 pixels,
so there were 16 additional pixels on each side that just provided context for
the prediction.
Batch size was 128, learning rate was set to 0.0001
(but loss was multiplied by the batch size).
Learning rate was divided by 5 on the 25-th epoch
and then again by 5 on the 50-th epoch,
most models were trained for 70-100 epochs.
Patches that formed a batch were selected completely randomly across all images.
During one epoch, network saw patches that covered about one half
of the whole training set area. Best results for individual classes
were achieved when training on related classes, for example buildings
and structures, roads and tracks, two kinds of vehicles.
Augmentations included small rotations for some classes
(±10-25 degrees for houses, structures and both vehicle classes),
full rotations and vertical/horizontal flips
for other classes. Small amount of dropout (0.1) was used in some cases.
Alignment between channels was fixed with the help of
``cv2.findTransformECC``, and lower-resolution layers were upscaled to
match RGB size. In most cases, 12 channels were used (RGB, P, M),
while in some cases just RGB and P or all 20 channels made results
slightly better.
Validation
----------
Validation was very hard, especially for both water and both vehicle
classes. In most cases, validation was performed on 5 images
(6140_3_1, 6110_1_2, 6160_2_1, 6170_0_4, 6100_2_2), while other 20 were used
for training. Re-training the model with the same parameters on all 25 images
improved LB score.
====
https://github.com/mullerhai/sshjumphive
Note: this repo is not supported. License is MIT.
.. contents::
Object types
------------
Note that ssh_jump_hive is an tools can jump the jump machine to connect hive get hive data to pandas dataframe:
- 0: hive_client for simple connect hive server with no jump server
- 1: Jump_Tunnel just for connect hive server with jump server separete
- 2: SSH_Tunnel for get ssh tunnel channel
General approach
----------------
if you want to use it ,you need to know some things
for example these parameters [ jumphost,jumpport,jumpuser,jumppwd,tunnelhost,tunnelAPPport,localhost,localbindport]
for hive server you also need to know params [localhost, hiveusername, hivepassword, localbindport,database, auth]
for query hive data you need to know params [ table, query_fileds_list, partions_param_dict, query_limit]
if your hive server has jump server separete, you need do like this
[
from ssh_jump_hive import Jump_Tunnel_HIVE
import pandas as pd
.....
table = 'tab_client_label'
partions_param_dict = {'client_nmbr': 'AA75', 'batch': 'p1'}
query_fileds_list = ['gid', 'realname', 'card']
querylimit = 1000
jump=Jump_Tunnel(jumphost,jumpport,jumpuser,jumppwd,tunnelhost,tunnelhiveport,localhost,localbindport)
df2=jump.get_JUMP_df(table,partions_param_dict,query_fileds_list,querylimit)
print(df2.shape)
print(df2.head(100))
print(df2.columns())
]
UNet network with batch-normalization added, training with Adam optimizer with
a loss that is a sum of 0.1 cross-entropy and 0.9 dice loss.
Input for UNet was a 116 by 116 pixel patch, output was 64 by 64 pixels,
so there were 16 additional pixels on each side that just provided context for
the prediction.
Batch size was 128, learning rate was set to 0.0001
(but loss was multiplied by the batch size).
Learning rate was divided by 5 on the 25-th epoch
and then again by 5 on the 50-th epoch,
most models were trained for 70-100 epochs.
Patches that formed a batch were selected completely randomly across all images.
During one epoch, network saw patches that covered about one half
of the whole training set area. Best results for individual classes
were achieved when training on related classes, for example buildings
and structures, roads and tracks, two kinds of vehicles.
Augmentations included small rotations for some classes
(±10-25 degrees for houses, structures and both vehicle classes),
full rotations and vertical/horizontal flips
for other classes. Small amount of dropout (0.1) was used in some cases.
Alignment between channels was fixed with the help of
``cv2.findTransformECC``, and lower-resolution layers were upscaled to
match RGB size. In most cases, 12 channels were used (RGB, P, M),
while in some cases just RGB and P or all 20 channels made results
slightly better.
Validation
----------
Validation was very hard, especially for both water and both vehicle
classes. In most cases, validation was performed on 5 images
(6140_3_1, 6110_1_2, 6160_2_1, 6170_0_4, 6100_2_2), while other 20 were used
for training. Re-training the model with the same parameters on all 25 images
improved LB score.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ssh_jump_hive-0.2.0.tar.gz
(3.1 kB
view details)
Built Distribution
File details
Details for the file ssh_jump_hive-0.2.0.tar.gz
.
File metadata
- Download URL: ssh_jump_hive-0.2.0.tar.gz
- Upload date:
- Size: 3.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | da06e9227b73dbd7010a39c45d1cfa6ce2a9b6087b03f268f62fc1836abde67f |
|
MD5 | 251e39c4566ea193afe8805061c8ed58 |
|
BLAKE2b-256 | 3f6d7506fa9068aabe4456a3f80c95198544472a45fc02a44eead85c8de514a0 |
File details
Details for the file ssh_jump_hive-0.2.0-py2.py3-none-any.whl
.
File metadata
- Download URL: ssh_jump_hive-0.2.0-py2.py3-none-any.whl
- Upload date:
- Size: 2.9 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b825cda83ba7a1343299800e2d9bde0c85a70aa5c933b2c99ed25b2f48a5eba |
|
MD5 | 0742c3a16e200391d09678cfb8d420c5 |
|
BLAKE2b-256 | 0b84655c9ad2d3c3dbbe398e7e93fc67daca7ef73fcce4846d8f2a26be2e0397 |