MLFeatureSelection

Features selection algorithm based on self selected algorithm, loss function and validation method

These details have not been verified by PyPI

Project links

Homepage

Project description

This code is for general features selection based on certain machine learning algorithm and evaluation methos

You can modified you validation method and loss function all by yourself

How to run

The demo is based on the IJCAI-2018 data moning competitions

Import library from FeatureSelection.py and also other necessary library

from MLFeatureSelection import FeatureSelection as FS
from sklearn.metrics import log_loss
import lightgbm as lgbm
import pandas as pd
import numpy as np

Generate for dataset

def prepareData():
    df = pd.read_csv('IJCAI-2018/data/train/trainb.csv')
    df = df[~pd.isnull(df.is_trade)]
    item_category_list_unique = list(np.unique(df.item_category_list))
    df.item_category_list.replace(item_category_list_unique, list(np.arange(len(item_category_list_unique))), inplace=True)
    return df

Define your loss function

def modelscore(y_test, y_pred):
    return log_loss(y_test, y_pred)

Define the way to validate

def validation(X,y, features, clf, lossfunction):
    totaltest = 0
    for D in [24]:
        T = (X.day != D)
        X_train, X_test = X[T], X[~T]
        X_train, X_test = X_train[features], X_test[features]
        y_train, y_test = y[T], y[~T]
        clf.fit(X_train,y_train, eval_set = [(X_train, y_train), (X_test, y_test)], eval_metric='logloss', verbose=False,early_stopping_rounds=200) #the train method must match your selected algorithm
        totaltest += lossfunction(y_test, clf.predict_proba(X_test)[:,1])
    totaltest /= 1.0
    return totaltest

Define the cross method (required when Cross = True)

def add(x,y):
    return x + y

def substract(x,y):
    return x - y

def times(x,y):
    return x * y

def divide(x,y):
    return (x + 0.001)/(y + 0.001)

CrossMethod = {'+':add,
               '-':substract,
               '*':times,
               '/':divide,}

Initial the seacher with customized procedure (sequence + random + cross)

sf = FS.Select(Sequence = False, Random = True, Cross = False) #select the way you want to process searching

Import loss function

sf.ImportLossFunction(modelscore,direction = 'descend')

Import dataset

sf.ImportDF(prepareData(),label = 'is_trade')

Import cross method (required when Cross = True)

sf.ImportCrossMethod(CrossMethod)

Define non-trainable features

sf.InitialNonTrainableFeatures(['used','instance_id', 'item_property_list', 'context_id', 'context_timestamp', 'predict_category_property', 'is_trade'])

Define initial features’ combination

sf.InitialFeatures(['item_category_list', 'item_price_level','item_sales_level','item_collected_level', 'item_pv_level','day'])

Define features with potential that can be added later

sf.AddPotentialFeatures(['user_age_level'])

Define algorithm

sf.clf = lgbm.LGBMClassifier(random_state=1, num_leaves = 6, n_estimators=5000, max_depth=3, learning_rate = 0.05, n_jobs=8)

Define log file name

sf.SetLogFile('record.log')

Set maximum features quantity

sf.SetFeaturesLimit(40) #maximum number of features

Set maximum time limit (in minutes)

sf.SetTimeLimit(100) #maximum running time in minutes

Set sample ratio of total dataset, when samplemode equals to 0, running the same subset, when samplemode equals to 1, subset will be different each time

sf.SetSample(0.1, samplemode = 0)

Generate feature library, can specific certain key word and selection step

sf.GenerateCol(key = 'mean', selectstep = 2) #can iterate different features set

Run with self-define validate method

sf.run(validation)

This code take a while to run, you can stop it any time and restart by replace the best features combination in temp sf.InitialFeatures()

This features selection method achieved

1st in Rong360

– https://github.com/duxuhao/rong360-season2

Temporary Top 10 in JData-2018
12nd in IJCAI-2018 1st round

Algorithm details

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.0.9.5.1

May 8, 2019

0.0.9.5

May 8, 2019

0.0.9.4

May 6, 2019

0.0.9.3

May 6, 2019

0.0.9.2

May 5, 2019

0.0.9.1.2

May 5, 2019

0.0.9.1.1

May 5, 2019

0.0.9.1

May 5, 2019

0.0.9

May 5, 2019

0.0.8.2

Sep 18, 2018

0.0.8.1

Sep 16, 2018

0.0.8

Sep 16, 2018

0.0.7.2

Jun 6, 2018

0.0.7.1

Jun 5, 2018

0.0.7

Jun 3, 2018

0.0.6.6

Jun 2, 2018

0.0.6.5

Jun 2, 2018

0.0.6.4

May 26, 2018

0.0.6.3

May 25, 2018

This version

0.0.6.2

May 25, 2018

0.0.6.1

May 25, 2018

0.0.6

May 25, 2018

0.0.5

May 22, 2018

0.0.4.2

May 21, 2018

0.0.4.1

May 6, 2018

0.0.4

May 4, 2018

0.0.3

May 2, 2018

0.0.2.2

Apr 27, 2018

0.0.2.1

Apr 27, 2018

0.0.2

Apr 27, 2018

0.0.1.9.3

Apr 27, 2018

0.0.1.9.2

Apr 27, 2018

0.0.1.9.1

Apr 27, 2018

0.0.1.9

Apr 27, 2018

0.0.1.8

Apr 27, 2018

0.0.1.7

Apr 27, 2018

0.0.1.6

Apr 27, 2018

0.0.1.5

Apr 27, 2018

0.0.1.4

Apr 27, 2018

0.0.1.3

Apr 27, 2018

0.0.1.2

Apr 27, 2018

0.0.1.1

Apr 27, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MLFeatureSelection-0.0.6.2.tar.gz (13.8 kB view details)

Uploaded May 25, 2018 Source

File details

Details for the file MLFeatureSelection-0.0.6.2.tar.gz.

File metadata

Download URL: MLFeatureSelection-0.0.6.2.tar.gz
Upload date: May 25, 2018
Size: 13.8 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for MLFeatureSelection-0.0.6.2.tar.gz
Algorithm	Hash digest
SHA256	`73064a205441ccbf0e3fd27034049f99da0c90bd12ee0592779ff6aaf3b0a1ba`
MD5	`cf70b09af0d92a4ceea95fc2f5e1ea4d`
BLAKE2b-256	`c712f3b0b46d36db9ee0077de89fb30e9ac19845ec5541717020f5e243f93052`

See more details on using hashes here.

MLFeatureSelection 0.0.6.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

How to run

This features selection method achieved

Algorithm details

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes