Skip to main content

Python wrapper for luban.

Project description

luban

中文文档

Why Need?

The feature-processing module is used in different scenarios, such as: model training and model inference. Usually, we use Python to train models, and feature-processing is also done in Python. However, in model inference, the shorter the inference run time, the better. So we usually use C++/java/golang or other languages to do model inference.

If feature-process module is written in many different languages, they may be inconsistencies. So, we decide to develop such a module in C++, which can be used by other different languages. In addition, we use a yaml configuration for feature processing, so that it will be easier to use.

Supported libraries and tools

  1. c lib: libluban
  2. python lib: pyluban

Configuration

{
    "transforms": [
        "B = log(A)",
        "C = A * log(A)",
        "E = D * C",
        "G = date_add(F, H)",
        "K = concat(date_add(F, I), \"100\")"
    ],
    "outputs": [
        {
            "name": "A",
            "group": 0
        },
        {
            "name": "G",
            "group": 1
        }
    ]
}

Transforms

This is the configuration of the expression, and is explained in the next section.

Outputs

This is the final output of features, and then each feature has a group id, and the group id is continuously encoded from 0 and is not repeatable.

Parse Expression

After installing the Python tool, the luban_parser tool is installed by default in /usr/local/bin. This tool is used to parse the above JSON configuration and generate the configuration using the toml format used by C++. The expressions here are similar to Python's syntax. Originally, we wanted to use antlr to customize the DSL, but after thinking about it, we thought that it could be as simple as possible. As a result, it was finally decided to parse expressions using Python's built-in ast.

Operators

  1. opr: +, -, *, /, %, **
  2. math function: round, floor, ceil, log, exp, log10, log2, sqrt, abs, sin, sinh, asin,asinh, cos, cosh, acos, acosh, tan, tanh, atan, atanh, sigmoid
  3. aggravate function: min, max, variance, stddev, average, norm
  4. time function: year, month, day, hour, minute, second, date, now, date_diff, date_add, date_sub, from_unixtime, unix_timestamp
  5. string function: concat, substr, lower, upper, cross, reverse
  6. topk function: topki, topkf, topks
  7. other function: min_max, z_score, binarize, bucketize, box_cox, normalize

Usage

  1. Follow the installation prompts in the next section to compile and install the tool

  2. Include header files and add libluban to dynamic link paths

  3. Steps:

    1. step1: Configure the JSON file
    2. use luban_parser to process JSON configuration files and generate configuration files in TOML format
    3. Use the configuration file in TOML format as a configuration input for C/C++/Golang/Python

Install

install protobuf

The installation script under CentOS 7 is below, other systems is similar.

#!/bin/shell

yum install -y git wget
yum install -y openssl openssl-devel gcc-c++
yum install -y snappy snappy-devel autoconf automake libtool
yum install -y bzip2 bzip2-devel lz4-devel libzstd-devel
yum install -y epel-release gflags gflags-devel which 

# install cmake
# use cmake to compile this project for c/c++ lib
cd /tmp 
wget https://github.com/Kitware/CMake/releases/download/v3.18.2/cmake-3.18.2.tar.gz
tar -xvf cmake-3.18.2.tar.gz
cd cmake-3.18.2
./bootstrap
gmake && gmake install

# instal libunwind
cd /tmp
wget http://download.savannah.gnu.org/releases/libunwind/libunwind-1.5.0.tar.gz
tar -xvf libunwind-1.5.0.tar.gz
cd libunwind-1.5.0
CFLAGS=-fPIC ./configure
make CFLAGS=-fPIC && make CFLAGS=-fPIC install 

# install gperftools and tcmalloc
cd /tmp
wget https://github.com/gperftools/gperftools/releases/download/gperftools-2.7/gperftools-2.7.tar.gz
tar -xvf gperftools-2.7.tar.gz 
cd gperftools-2.7
./configure
make -j6 && make install 

# install protobuf
cd /tmp
git clone https://github.com/protocolbuffers/protobuf.git
cd protobuf 
git checkout v3.8.0 && git submodule update --init --recursive
./autogen.sh
./configure 
make && make install
ldconfig

rm -rf /tmp/*

install pyluban

python setup.py install --install-scripts=/usr/local/bin

Q&A

Fatal Python error: type_traverse() called for non-heap type 'Entity' this is python's bug, please upgrade python to python3.8 or higher

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyluban-1.0.0.tar.gz (73.2 kB view hashes)

Uploaded Source

Built Distributions

pyluban-1.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

pyluban-1.0.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (17.0 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

pyluban-1.0.0-cp310-cp310-macosx_11_0_arm64.whl (260.4 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

pyluban-1.0.0-cp310-cp310-macosx_10_9_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

pyluban-1.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

pyluban-1.0.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (17.0 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

pyluban-1.0.0-cp39-cp39-macosx_11_0_arm64.whl (260.3 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

pyluban-1.0.0-cp39-cp39-macosx_10_9_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

pyluban-1.0.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

pyluban-1.0.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (17.0 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

pyluban-1.0.0-cp38-cp38-macosx_11_0_arm64.whl (260.4 kB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

pyluban-1.0.0-cp38-cp38-macosx_10_9_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

pyluban-1.0.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

pyluban-1.0.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (17.0 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARM64

pyluban-1.0.0-cp37-cp37m-macosx_10_9_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.7m macOS 10.9+ x86-64

pyluban-1.0.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64

pyluban-1.0.0-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (17.0 MB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.17+ ARM64

pyluban-1.0.0-cp36-cp36m-macosx_10_9_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.6m macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page