MLFeatureSelection

geenral select features based on selected algorithm

These details have not been verified by PyPI

Project links

Homepage

Project description

Features Selection
=====================================

This code is for general features selection based on certain machine learning algorithm and evaluation methos

How to run (see demo.py)
------------------------------------------------

The demo is based on the IJCAI-2018 data moning competitions

- Import library from FeatureSelection.py and also other necessary library

.. code-block:: python
import MLFeaturesSelection as FS
from sklearn.metrics import log_loss
import lightgbm as lgbm
import pandas as pd
import numpy as np

- Generate for dataset

.. code-block:: python

def prepareData():
df = pd.read_csv('IJCAI-2018/data/train/trainb.csv')
df = df[~pd.isnull(df.is_trade)]
item_category_list_unique = list(np.unique(df.item_category_list))
df.item_category_list.replace(item_category_list_unique, list(np.arange(len(item_category_list_unique))), inplace=True)
return df

- Define your loss function

.. code-block:: python

def modelscore(y_test, y_pred):
return log_loss(y_test, y_pred)

- Define the way to validate

.. code-block:: python

def validation(X,y,clf,lossfunction):
totaltest = 0
for D in [24]:
T = (X.day != D)
X_train, X_test = X[T], X[~T]
X_train, X_test = X_train, X_test
y_train, y_test = y[T], y[~T]
clf.fit(X_train,y_train, eval_set = [(X_train, y_train), (X_test, y_test)], eval_metric='logloss', verbose=False,early_stopping_rounds=200) #the train method must match your selected algorithm
totaltest += lossfunction(y_test, clf.predict_proba(X_test)[:,1])
totaltest /= 2.0
return totaltest

- Define the cross method (required when *Cross = True*)

.. code-block:: python

def add(x,y):
return x + y

def substract(x,y):
return x - y

def times(x,y):
return x * y

def divide(x,y):
return (x + 0.001)/(y + 0.001)

CrossMethod = {'+':add,
'-':substract,
'*':times,
'/':divide,}

- Initial the seacher with customized procedure (sequence + random + cross)

.. code:: python

sf = FS.Select(Sequence = False, Random = True, Cross = False) #select the way you want to process searching

- Import loss function

.. code:: python

sf.ImportDF(prepareData(),label = 'is_trade')

- Import cross method (required when *Cross = True*)

.. code:: python

sf.ImportCrossMethod(CrossMethod)

- Define non-trainable features

.. code:: python

sf.NonTrainableFeatures = ['used','instance_id', 'item_property_list', 'context_id', 'context_timestamp', 'predict_category_property', 'is_trade']

- Define initial features' combination

.. code:: python

sf.InitialFeatures(['item_category_list', 'item_price_level','item_sales_level','item_collected_level', 'item_pv_level'])

- Define algorithm

.. code:: python

sf.clf = lgbm.LGBMClassifier(random_state=1, num_leaves = 6, n_estimators=5000, max_depth=3, learning_rate = 0.05, n_jobs=8)

- Define log file name

.. code:: python

sf.logfile = 'record.log'

- Run with self-define validate method

.. code:: python

sf.run(validation)

- This code take a while to run, you can stop it any time and restart by replace the best features combination in temp sf.InitialFeatures()

This features selection method achieved
------------------------------------------------------------------------------

- **1st** in Rong360

- https://github.com/duxuhao/rong360-season2

- **12nd** in IJCAI-2018 1st round

Algorithm details
----------------------------------

.. image:: (https://github.com/duxuhao/Feature-Selection/blob/master/Procedure.png)

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.0.9.5.1

May 8, 2019

0.0.9.5

May 8, 2019

0.0.9.4

May 6, 2019

0.0.9.3

May 6, 2019

0.0.9.2

May 5, 2019

0.0.9.1.2

May 5, 2019

0.0.9.1.1

May 5, 2019

0.0.9.1

May 5, 2019

0.0.9

May 5, 2019

0.0.8.2

Sep 18, 2018

0.0.8.1

Sep 16, 2018

0.0.8

Sep 16, 2018

0.0.7.2

Jun 6, 2018

0.0.7.1

Jun 5, 2018

0.0.7

Jun 3, 2018

0.0.6.6

Jun 2, 2018

0.0.6.5

Jun 2, 2018

0.0.6.4

May 26, 2018

0.0.6.3

May 25, 2018

0.0.6.2

May 25, 2018

0.0.6.1

May 25, 2018

0.0.6

May 25, 2018

0.0.5

May 22, 2018

0.0.4.2

May 21, 2018

0.0.4.1

May 6, 2018

0.0.4

May 4, 2018

0.0.3

May 2, 2018

0.0.2.2

Apr 27, 2018

0.0.2.1

Apr 27, 2018

0.0.2

Apr 27, 2018

0.0.1.9.3

Apr 27, 2018

0.0.1.9.2

Apr 27, 2018

0.0.1.9.1

Apr 27, 2018

0.0.1.9

Apr 27, 2018

0.0.1.8

Apr 27, 2018

0.0.1.7

Apr 27, 2018

0.0.1.6

Apr 27, 2018

0.0.1.5

Apr 27, 2018

0.0.1.4

Apr 27, 2018

0.0.1.3

Apr 27, 2018

This version

0.0.1.2

Apr 27, 2018

0.0.1.1

Apr 27, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MLFeatureSelection-0.0.1.2.tar.gz (6.5 kB view hashes)

Uploaded Apr 27, 2018 Source

Hashes for MLFeatureSelection-0.0.1.2.tar.gz

Hashes for MLFeatureSelection-0.0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`447a34a0828b172464b084e66bce9bfc1d7234a8f7295f3286b5e35725520680`
MD5	`924c9462b333c51558011ad4a0ab3ba8`
BLAKE2b-256	`00cd1e64fd86e7ea0eeadc01b9d30f4d5b0103fce4d24e63878f3c01becdbb08`