Skip to main content

A new Modular, Scalable, Configurable, Easy-to-Use and Extend infrastructure for Deep Learning based classification.

Project description

DeePray (深度祈祷): A new Modular, Scalable, Configurable, Easy-to-Use and Extend infrastructure for Deep Learning based Recommendation.

Documentation Status PyPI version GitHub version

Introduction

The DeePray library offers state-of-the-art algorithms for [deep learning recommendation]. DeePray is built on latest [TensorFlow 2][(https://tensorflow.org/)] and designed with modular structure, making it easy to discover patterns and answer questions about tabular-structed data.

The main goals of DeePray:

  • Easy to use, newbies can get hands dirty with deep learning quickly
  • Good performance with web-scale data
  • Easy to extend, Modular architecture let you build your Neural network like playing LEGO!

Let's Get Started! Please refer to the official docs at https://deepray.readthedocs.io/en/latest/.

Installation

Install DeePray using PyPI:

To install DeePray library from PyPI using pip, execute the following command:

pip install deepray

Install DeePray from Github source:

First, clone the DeePray repository using git:

git clone https://github.com/fuhailin/deepray.git

Then, cd to the deepray folder, and install the library by executing the following commands:

cd deepray
pip install .

Tutorial

Census Adult Data Set

Data preparation

In your tabular data, specify NUMERICAL for your continue features, CATEGORY for categorical features, VARIABLE for variable length features, and obviously LABEL for label column. Then process them to to TFRecord format into order to get good performance with large-scale dataset.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler, LabelEncoder

from deepray.utils.converter import CSV2TFRecord


# http://archive.ics.uci.edu/ml/datasets/Adult
train_data = 'DeePray/examples/census/data/raw_data/adult_data.csv'
df = pd.read_csv(train_data)
df['income_label'] = (df["income_bracket"].apply(lambda x: ">50K" in x)).astype(int)
df.pop('income_bracket')

NUMERICAL_FEATURES = ['age', 'fnlwgt', 'hours_per_week', 'capital_gain', 'capital_loss', 'education_num']
CATEGORY_FEATURES = [col for col in df.columns if col != LABEL and col not in NUMERICAL_FEATURES]
LABEL = ['income_label']

for feat in CATEGORY_FEATURES:
    lbe = LabelEncoder()
    df[feat] = lbe.fit_transform(df[feat])
# Feature normilization
mms = MinMaxScaler(feature_range=(0, 1))
df[NUMERICAL_FEATURES] = mms.fit_transform(df[NUMERICAL_FEATURES])


prebatch = 1  # flags.prebatch
converter = CSV2TFRecord(LABEL, NUMERICAL_FEATURES, CATEGORY_FEATURES, VARIABLE_FEATURES=[], gzip=False)
converter.write_feature_map(df, './data/feature_map.csv')

train_df, valid_df = train_test_split(df, test_size=0.2)
converter(train_df, out_file='./data/train.tfrecord', prebatch=prebatch)
converter(valid_df, out_file='./data/valid.tfrecord', prebatch=prebatch)

You will get a feature map file like that:

9,workclass,CATEGORICAL
16,education,CATEGORICAL
7,marital_status,CATEGORICAL
15,occupation,CATEGORICAL
6,relationship,CATEGORICAL
5,race,CATEGORICAL
2,gender,CATEGORICAL
42,native_country,CATEGORICAL
1,hours_per_week,NUMERICAL
1,capital_gain,NUMERICAL
1,age,NUMERICAL
1,fnlwgt,NUMERICAL
1,capital_loss,NUMERICAL
1,education_num,NUMERICAL
2,income_label,LABEL

And then create two txt file trainand valid separately to record train set TFRecords and valid set TFRecords file path.

Choose your model, Training and evaluation

"""
build and train model
"""

import sys

from absl import app, flags

import deepray as dp
from deepray.base.trainer import train
from deepray.model.build_model import BuildModel

FLAGS = flags.FLAGS


def main(flags=None):
    FLAGS(flags, known_only=True)
    flags = FLAGS
    model = BuildModel(flags)
    history = train(model)
    print(history)


argv = [
    sys.argv[0],
    '--model=lr',
    '--train_data=/Users/vincent/Projects/DeePray/examples/census/data/train',
    '--valid_data=/Users/vincent/Projects/DeePray/examples/census/data/valid',
    '--feature_map=/Users/vincent/Projects/DeePray/examples/census/data/feature_map.csv',
    '--learning_rate=0.01',
    '--epochs=10',
    '--batch_size=64',
]
main(flags=argv)

Models List

Titile Booktitle Resources
FM: Factorization Machines ICDM'2010 [pdf] [code]
FFM: Field-aware Factorization Machines for CTR Prediction RecSys'2016 [pdf] [code]
FNN: Deep Learning over Multi-field Categorical Data: A Case Study on User Response Prediction ECIR'2016 [pdf][code]
PNN: Product-based Neural Networks for User Response Prediction ICDM'2016 [pdf][code]
Wide&Deep: Wide & Deep Learning for Recommender Systems DLRS'2016 [pdf][code]
AFM: Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks IJCAI'2017 [pdf][code]
NFM: Neural Factorization Machines for Sparse Predictive Analytics SIGIR'2017 [pdf][code]
DeepFM: DeepFM: A Factorization-Machine based Neural Network for CTR Prediction[C] IJCAI'2017 [pdf] [code]
DCN: Deep & Cross Network for Ad Click Predictions ADKDD'2017 [pdf] [code]
xDeepFM: xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems KDD'2018 [pdf] [code]
DIN: DIN: Deep Interest Network for Click-Through Rate Prediction KDD'2018 [pdf] [code]
DIEN: DIEN: Deep Interest Evolution Network for Click-Through Rate Prediction AAAI'2019 [pdf] [code]
DSIN: Deep Session Interest Network for Click-Through Rate Prediction IJCAI'2019 [pdf][code]
AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks CIKM'2019 [pdf][code]
FLEN: Leveraging Field for Scalable CTR Prediction AAAI'2020 [pdf][code]
DFN: Deep Feedback Network for Recommendation IJCAI'2020 [pdf][code]

How to build your own model with DeePray

Inheriting BaseCTRModel class from from deepray.model.model_ctr, and implement your own build_network() method!

Contribution

DeePray is still under development, and call for contributions!

* Hailin Fu (`Hailin <https://github.com/fuhailin>`)
* Call for contributions!

让DeePray成为推荐算法新基建需要你的贡献

Citing

DeePray is designed, developed and supported by Hailin. If you use any part of this library in your research, please cite it using the following BibTex entry

@misc{DeePray,
  author = {Hailin Fu},
  title = {DeePray: A new Modular, Scalable, Configurable, Easy-to-Use and Extend infrastructure for Deep Learning based Recommendation},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/fuhailin/deepray}},
}

License

Copyright (c) Copyright © 2020 The DeePray Authors. All Rights Reserved.

Licensed under the Apach License.

Reference

https://github.com/shenweichen/DeepCTR

https://github.com/aimetrics/jarvis

https://github.com/shichence/AutoInt

Contact

If you want cooperation or have any questions, please follow my wechat offical account:

公众微信号ID:【StateOfTheArt】

StateOfTheArt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepray-0.1.2.tar.gz (33.5 kB view details)

Uploaded Source

Built Distribution

deepray-0.1.2-py3-none-any.whl (64.8 kB view details)

Uploaded Python 3

File details

Details for the file deepray-0.1.2.tar.gz.

File metadata

  • Download URL: deepray-0.1.2.tar.gz
  • Upload date:
  • Size: 33.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for deepray-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b9ebdf0ff1d349d4a6e963a16caa0154211c11c6dd02c69a3ae4ecfdba7e03f6
MD5 346e354be35cbdefdc21c65aa43aa56c
BLAKE2b-256 217f305669f09b69b1d0868af25e7f8eeb3d542026234b65b608851b1edb7adc

See more details on using hashes here.

File details

Details for the file deepray-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: deepray-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 64.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for deepray-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 eaf3bafaac419e27696e4ff5d2999d0b692486a0badc77a238cff3f021331ba5
MD5 71fa55a85f8510bd9f19224e9c7bd348
BLAKE2b-256 25435c735cb57614230fa7701d97bbf65bebc0acfeae6f937a8f8e49d035a717

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page