Skip to main content

offline_train_framework_in_diting_group

Project description

Documentation
Documentation
谛听组推荐系统离线训练框架

RSLib主要功能

Faster Deployment (sql-as-backbone)
State-of-the-art Recurrent Model (transformer-xl etc.)
Distributed DL (horovod etc.)
Deep Learning Accelerator (tvm etc.)
Utility Classes (file2hdfs etc.)

设计思路十问十答

Install

To install the current release:

$ pip install rslib

Demo

dataframe2hive功能demo

功能描述: 通过洛阁组通过的hdfs上传接口实现本地dataframe上传至hive表('\t'分割)的功能。由于hive数据导入时不进行类型检查(不支持schema on write),我们不提供直接插入现有表分区的操作,而是建一张新表。用户需要管理好dataframe的列名。 由于洛阁接口的问题,上传文件会有报错信息,本接口有报错重连机制,一般是能上传成功的。大文件不建议上传,不过测试下来也比较稳定,1.3G文件能在10分钟内上传完成。

环境要求(在user_profile/basic镜像基础上)

$ apt-get update && apt-get install -y krb5-user krb5-config libkrb5-dev
$ pip install requests-kerberos==0.12.0 hdfs==2.5.8 kerberos==1.3.0
$ pip install rslib
$ pip install requirements.txt  #custom path
$ kinit -kt code/data/up_recommend.keytab up_recommend  #custom path

示例python代码

import pandas as pd
from rslib.utils import dataupload
df = pd.DataFrame({'bb': [1, 2, 3], 'c': [2, 2, 3], 'aa': ['4', '5', '6']})
table = 'up_nsh_tmp.diting_rslib_test_20191021'
dataupload.pandas2hive(df, table)  #no partition
dataupload.pandas2hive(df, table, partition='2019-10-21')  #add partition

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rslib-1.6.2.tar.gz (90.9 kB view details)

Uploaded Source

File details

Details for the file rslib-1.6.2.tar.gz.

File metadata

  • Download URL: rslib-1.6.2.tar.gz
  • Upload date:
  • Size: 90.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.5.2

File hashes

Hashes for rslib-1.6.2.tar.gz
Algorithm Hash digest
SHA256 d6fc7aad313bc3ba8db81c28c1c65b159acdff26b2dd0ad7310d3225769e1701
MD5 b535f05099e51cf0d826980eafa1131a
BLAKE2b-256 a3281a3af1f0c38a15bc001916323386843a91ad4470918ec14f93866912fd88

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page