Skip to main content

A new Modular, Scalable, Configurable, Easy-to-Use and Extend infrastructure for Deep Learning based Recommendation.

Project description

DeePray (深度祈祷): A new Modular, Scalable, Configurable, Easy-to-Use and Extend infrastructure for Deep Learning based Recommendation.

Documentation Status PyPI version GitHub version

Introduction

The DeePray library offers state-of-the-art algorithms for [deep learning recommendation]. DeePray is built on latest [TensorFlow 2][(https://tensorflow.org/)] and designed with modular structure, making it easy to discover patterns and answer questions about tabular-structed data.

The main goals of DeePray:

  • Easy to use, newbees can get hands dirty with deep learning quickly
  • Good performance with web-scale data
  • Easy to extend, Modular architecture let you build your Neural network like playing LEGO!

Let's Get Started! Please refer to the official docs at https://deepray.readthedocs.io/en/latest/.

Installation

Install DeePray using PyPI:

To install DeePray library from PyPI using pip, execute the following command:

pip install deepray

Install DeePray from Github source:

First, clone the DeePray repository using git:

git clone https://github.com/fuhailin/deepray.git

Then, cd to the deepray folder, and install the library by executing the following commands:

cd deepray
pip install .

Tutorial

Census Adult Data Set

Data preparation

In your tabular data, specify NUMERICAL for your continue features, CATEGORY for categorical features, VARIABLE for variable length features, and obviously LABEL for label column. Then process them to to TFRecord format into order to get good performance with large-scale dataset.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler, LabelEncoder

from deepray.utils.converter import CSV2TFRecord


# http://archive.ics.uci.edu/ml/datasets/Adult
train_data = 'DeePray/examples/census/data/raw_data/adult_data.csv'
df = pd.read_csv(train_data)
df['income_label'] = (df["income_bracket"].apply(lambda x: ">50K" in x)).astype(int)
df.pop('income_bracket')

NUMERICAL_FEATURES = ['age', 'fnlwgt', 'hours_per_week', 'capital_gain', 'capital_loss', 'education_num']
CATEGORY_FEATURES = [col for col in df.columns if col != LABEL and col not in NUMERICAL_FEATURES]
LABEL = ['income_label']

for feat in CATEGORY_FEATURES:
    lbe = LabelEncoder()
    df[feat] = lbe.fit_transform(df[feat])
# Feature normilization
mms = MinMaxScaler(feature_range=(0, 1))
df[NUMERICAL_FEATURES] = mms.fit_transform(df[NUMERICAL_FEATURES])


prebatch = 1  # flags.prebatch
converter = CSV2TFRecord(LABEL, NUMERICAL_FEATURES, CATEGORY_FEATURES, VARIABLE_FEATURES=[], gzip=False)
converter.write_feature_map(df, './data/feature_map.csv')

train_df, valid_df = train_test_split(df, test_size=0.2)
converter(train_df, out_file='./data/train.tfrecord', prebatch=prebatch)
converter(valid_df, out_file='./data/valid.tfrecord', prebatch=prebatch)

You will get a feature map file like that:

9,workclass,CATEGORICAL
16,education,CATEGORICAL
7,marital_status,CATEGORICAL
15,occupation,CATEGORICAL
6,relationship,CATEGORICAL
5,race,CATEGORICAL
2,gender,CATEGORICAL
42,native_country,CATEGORICAL
1,hours_per_week,NUMERICAL
1,capital_gain,NUMERICAL
1,age,NUMERICAL
1,fnlwgt,NUMERICAL
1,capital_loss,NUMERICAL
1,education_num,NUMERICAL
2,income_label,LABEL

And then create two txt file trainand valid separately to record train set TFRecords and valid set TFRecords file path.

Choose your model, Training and evaluation

"""
build and train model
"""

import sys

from absl import app, flags

import deepray as dp
from deepray.base.trainer import train
from deepray.model.build_model import BuildModel

FLAGS = flags.FLAGS


def main(flags=None):
    FLAGS(flags, known_only=True)
    flags = FLAGS
    model = BuildModel(flags)
    history = train(model)
    print(history)


argv = [
    sys.argv[0],
    '--model=lr',
    '--train_data=/Users/vincent/Projects/DeePray/examples/census/data/train',
    '--valid_data=/Users/vincent/Projects/DeePray/examples/census/data/valid',
    '--feature_map=/Users/vincent/Projects/DeePray/examples/census/data/feature_map.csv',
    '--learning_rate=0.01',
    '--epochs=10',
    '--batch_size=64',
]
main(flags=argv)

Models List

Titile Booktitle Resources
FM: Factorization Machines ICDM'2010 [pdf] [code]
FFM: Field-aware Factorization Machines for CTR Prediction RecSys'2016 [pdf] [code]
FNN: Deep Learning over Multi-field Categorical Data: A Case Study on User Response Prediction ECIR'2016 [pdf][code]
PNN: Product-based Neural Networks for User Response Prediction ICDM'2016 [pdf][code]
Wide&Deep: Wide & Deep Learning for Recommender Systems DLRS'2016 [pdf][code]
AFM: Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks IJCAI'2017 [pdf][code]
NFM: Neural Factorization Machines for Sparse Predictive Analytics SIGIR'2017 [pdf][code]
DeepFM: DeepFM: A Factorization-Machine based Neural Network for CTR Prediction[C] IJCAI'2017 [pdf] [code]
DCN: Deep & Cross Network for Ad Click Predictions ADKDD'2017 [pdf] [code]
xDeepFM: xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems KDD'2018 [pdf] [code]
DIN: DIN: Deep Interest Network for Click-Through Rate Prediction KDD'2018 [pdf] [code]
DIEN: DIEN: Deep Interest Evolution Network for Click-Through Rate Prediction AAAI'2019 [pdf] [code]
DSIN: Deep Session Interest Network for Click-Through Rate Prediction IJCAI'2019 [pdf][code]
AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks CIKM'2019 [pdf][code]
FLEN: Leveraging Field for Scalable CTR Prediction AAAI'2020 [pdf][code]
DFN: Deep Feedback Network for Recommendation IJCAI'2020 [pdf][code]

How to build your own model with DeePray

Inheriting BaseCTRModel class from from deepray.model.model_ctr, and implement your own build_network() method!

Contribution

DeePray is still under development, and call for contributions!

* Hailin Fu (`Hailin <https://github.com/fuhailin>`)
* Call for contributions!

让DeePray成为推荐算法新基建需要你的贡献

Citing

DeePray is designed, developed and supported by Hailin. If you use any part of this library in your research, please cite it using the following BibTex entry

@misc{DeePray,
  author = {Hailin Fu},
  title = {DeePray: A new Modular, Scalable, Configurable, Easy-to-Use and Extend infrastructure for Deep Learning based Recommendation},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/fuhailin/deepray}},
}

License

Copyright (c) Copyright © 2020 The DeePray Authors. All Rights Reserved.

Licensed under the Apach License.

Contact

If you want cooperation or have any questions, please follow my wechat offical account:

公众微信号ID:【StateOfTheArt】

StateOfTheArt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepray-0.1.1.tar.gz (32.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deepray-0.1.1-py3-none-any.whl (59.1 kB view details)

Uploaded Python 3

File details

Details for the file deepray-0.1.1.tar.gz.

File metadata

  • Download URL: deepray-0.1.1.tar.gz
  • Upload date:
  • Size: 32.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for deepray-0.1.1.tar.gz
Algorithm Hash digest
SHA256 dbee16937294e082ad1123f6399ed222ab1a41d7569a238c8a11d18827d9a05c
MD5 cbc345a9681e95d64381702002bc940f
BLAKE2b-256 be3d3ac53153925341f0dfaaf479d3a3d062b01b47c1d2bd1ab5bc4468471744

See more details on using hashes here.

File details

Details for the file deepray-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: deepray-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 59.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for deepray-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 be84249010b57b0471bb2149d368ec6de68df07ce109db51beb9066a4c1302e6
MD5 047fc349c566076bb67680e71ab8c86b
BLAKE2b-256 0a19ea6fe7686463bdc4ffbc787aac66c1f124c791ebc039981d1cba388cd104

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page