Skip to main content
Join the official 2020 Python Developers SurveyStart the survey!

offline train_env framework in diting group

Project description

Documentation
Documentation
谛听组推荐系统离线训练框架

RSLib主要功能

Faster Deployment (sql-as-backbone)
State-of-the-art Recurrent Model (transformer-xl etc.)
Distributed DL (horovod etc.)
Deep Learning Accelerator (tvm etc.)
Utility Classes (file2hdfs etc.)

设计思路十问十答

Install

To install the current release:

$ pip install rslib

Demo

dataframe2hive功能demo

功能描述: 通过洛阁组通过的hdfs上传接口实现本地dataframe上传至hive表('\t'分割)的功能。由于hive数据导入时不进行类型检查(不支持schema on write),我们不提供直接插入现有表分区的操作,而是建一张新表。用户需要管理好dataframe的列名。 由于洛阁接口的问题,上传文件会有报错信息,本接口有报错重连机制,一般是能上传成功的。大文件不建议上传,不过测试下来也比较稳定,1.3G文件能在10分钟内上传完成。

环境要求(在user_profile/basic镜像基础上)

$ apt-get update && apt-get install -y krb5-user krb5-config libkrb5-dev
$ pip install requests-kerberos==0.12.0 hdfs==2.5.8 kerberos==1.3.0
$ pip install rslib
$ pip install requirements.txt  #custom path
$ kinit -kt code/data/up_recommend.keytab up_recommend  #custom path

示例python代码

import pandas as pd
from rslib.utils import dataupload
df = pd.DataFrame({'bb': [1, 2, 3], 'c': [2, 2, 3], 'aa': ['4', '5', '6']})
table = 'up_nsh_tmp.diting_rslib_test_20191021'
dataupload.pandas2hive(df, table)  #no partition
dataupload.pandas2hive(df, table, partition='2019-10-21')  #add partition

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for rslib, version 2.1.1
Filename, size File type Python version Upload date Hashes
Filename, size rslib-2.1.1.tar.gz (201.9 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page