Skip to main content

offline train_env framework in diting group

Project description

Documentation
Documentation
谛听组推荐系统离线训练框架

RSLib主要功能

Faster Deployment (sql-as-backbone)
State-of-the-art Recurrent Model (transformer-xl etc.)
Distributed DL (horovod etc.)
Deep Learning Accelerator (tvm etc.)
Utility Classes (file2hdfs etc.)

设计思路十问十答

Install

To install the current release:

$ pip install rslib

Demo

dataframe2hive功能demo

功能描述: 通过洛阁组通过的hdfs上传接口实现本地dataframe上传至hive表('\t'分割)的功能。由于hive数据导入时不进行类型检查(不支持schema on write),我们不提供直接插入现有表分区的操作,而是建一张新表。用户需要管理好dataframe的列名。 由于洛阁接口的问题,上传文件会有报错信息,本接口有报错重连机制,一般是能上传成功的。大文件不建议上传,不过测试下来也比较稳定,1.3G文件能在10分钟内上传完成。

环境要求(在user_profile/basic镜像基础上)

$ apt-get update && apt-get install -y krb5-user krb5-config libkrb5-dev
$ pip install requests-kerberos==0.12.0 hdfs==2.5.8 kerberos==1.3.0
$ pip install rslib
$ pip install requirements.txt  #custom path
$ kinit -kt code/data/up_recommend.keytab up_recommend  #custom path

示例python代码

import pandas as pd
from rslib.utils import dataupload
df = pd.DataFrame({'bb': [1, 2, 3], 'c': [2, 2, 3], 'aa': ['4', '5', '6']})
table = 'up_nsh_tmp.diting_rslib_test_20191021'
dataupload.pandas2hive(df, table)  #no partition
dataupload.pandas2hive(df, table, partition='2019-10-21')  #add partition

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rslib-2.4.2.tar.gz (206.5 kB view details)

Uploaded Source

File details

Details for the file rslib-2.4.2.tar.gz.

File metadata

  • Download URL: rslib-2.4.2.tar.gz
  • Upload date:
  • Size: 206.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.10.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.6.4

File hashes

Hashes for rslib-2.4.2.tar.gz
Algorithm Hash digest
SHA256 c104b14de1be402d271de183c1c9a3bfc2f870ea6a7a7855112f733b7df3efcd
MD5 af3a8e9ba7aa99baf9927db4b6460229
BLAKE2b-256 4d9af3e7fda0ad1a28f7b896e36522e15fa02fcc4aba8c7f924d9a9213eb6dd4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page