Combination Dependent Learning to Rank
Project description
# cdl2r
Combination Dependent Learning to Rank (組合せ依存型ランキング学習).
## requirements
- Python 3.6.x ~, 3.7.x ~
## dependencies
- NumPy
- Pandas
## installation
```shell
$ pip install cdl2r
```
## usage
### 1. prepare your dataset
The dataset format is like SVM-rank one.
The difference is `eid` must be specified in a line.
Here is a definition of a line.
`|` symbol means `OR` (so `<str>|<int>` means the value must have either str or int type).
```txt
<line> .=. <label> qid:<qid> eid:<eid> <features>#<comments>
<label> .=. <float>|<str as a class>
<qid> .=. <str>|<int>
<eid> .=. <str>|<int>
<features> .=. <dim>:<value>
<dim> .=. <0 or Natural Number>
<value> .=. <float>
<comments> .=. <Any text will do>
```
Let me show you an example.
```txt
0.5 qid:1 eid:x 1:0.1 2:-0.2 3:0.3#comment A
0.0 qid:1 eid:y 1:-0.1 2:0.2 4:0.4
-0.5 qid:1 eid:z 2:-0.2 3:0.3 4:-0.4#comment C
0.5 qid:2 eid:y 1:0.1 2:-0.2 3:0.3
0.0 qid:2 eid:z 1:-0.1 2:0.2 4:0.4
-0.5 qid:2 eid:w 2:-0.2 3:0.3 4:-0.4#comment E
```
### 2. loading your dataset
```python
from cdl2r.dataset import load_data
# loading dataset as a DataFrame object
data_path = '/path/to/dataset'
n_dimensions = 10
train = load_data(data_path, n_dimensions)
# train.columns
# >>> Index(['label', 'qid', 'eid', 'features'], dtype='object')
```
### 3. fitting the model
```python
from cdl2r.models import CDFMRegressor
# define your model
model = CDFMRegressor(n_factors=8, n_iterations=300, init_eta=1e-2)
# fitting, printing out epoch losses if verbose is True
model.fit(train, verbose=True)
```
### 4. save the model
```python
import pickle
with open('/path/to/file.pkl', mode='wb') as fp:
pickle.dump(model, fp)
```
### 5. make prediction
```python
# loading test dataset
test = load_data(test_path, n_dimensions)
pred = model.predict(test)
# pred.columns
# >>> Index(['pred_label', 'qid', 'eid', 'features'], dtype='object')
```
## development
### build Cython modules
```shell
$ python setup.py build_ext --inplace
```
### profiling
```shell
# decorate a method with `@profile` in a script where you want to profile.
$ kernprof -l -v <script>.py
```
### pylint
- max-line-length: 130
- disable snake-case
### release
```shell
# build
$ python setup.py bdist_whell
# testing upload
$ twine upload --repository testpypi dist/<cdl2r-version-pkg>
$ pip install --index-url https://test.pypi.org/simple/<cdl2r-version-pkg>
# upload
$ twine upload --repository pypi dist<cdl2r-version-pkg>
```
Combination Dependent Learning to Rank (組合せ依存型ランキング学習).
## requirements
- Python 3.6.x ~, 3.7.x ~
## dependencies
- NumPy
- Pandas
## installation
```shell
$ pip install cdl2r
```
## usage
### 1. prepare your dataset
The dataset format is like SVM-rank one.
The difference is `eid` must be specified in a line.
Here is a definition of a line.
`|` symbol means `OR` (so `<str>|<int>` means the value must have either str or int type).
```txt
<line> .=. <label> qid:<qid> eid:<eid> <features>#<comments>
<label> .=. <float>|<str as a class>
<qid> .=. <str>|<int>
<eid> .=. <str>|<int>
<features> .=. <dim>:<value>
<dim> .=. <0 or Natural Number>
<value> .=. <float>
<comments> .=. <Any text will do>
```
Let me show you an example.
```txt
0.5 qid:1 eid:x 1:0.1 2:-0.2 3:0.3#comment A
0.0 qid:1 eid:y 1:-0.1 2:0.2 4:0.4
-0.5 qid:1 eid:z 2:-0.2 3:0.3 4:-0.4#comment C
0.5 qid:2 eid:y 1:0.1 2:-0.2 3:0.3
0.0 qid:2 eid:z 1:-0.1 2:0.2 4:0.4
-0.5 qid:2 eid:w 2:-0.2 3:0.3 4:-0.4#comment E
```
### 2. loading your dataset
```python
from cdl2r.dataset import load_data
# loading dataset as a DataFrame object
data_path = '/path/to/dataset'
n_dimensions = 10
train = load_data(data_path, n_dimensions)
# train.columns
# >>> Index(['label', 'qid', 'eid', 'features'], dtype='object')
```
### 3. fitting the model
```python
from cdl2r.models import CDFMRegressor
# define your model
model = CDFMRegressor(n_factors=8, n_iterations=300, init_eta=1e-2)
# fitting, printing out epoch losses if verbose is True
model.fit(train, verbose=True)
```
### 4. save the model
```python
import pickle
with open('/path/to/file.pkl', mode='wb') as fp:
pickle.dump(model, fp)
```
### 5. make prediction
```python
# loading test dataset
test = load_data(test_path, n_dimensions)
pred = model.predict(test)
# pred.columns
# >>> Index(['pred_label', 'qid', 'eid', 'features'], dtype='object')
```
## development
### build Cython modules
```shell
$ python setup.py build_ext --inplace
```
### profiling
```shell
# decorate a method with `@profile` in a script where you want to profile.
$ kernprof -l -v <script>.py
```
### pylint
- max-line-length: 130
- disable snake-case
### release
```shell
# build
$ python setup.py bdist_whell
# testing upload
$ twine upload --repository testpypi dist/<cdl2r-version-pkg>
$ pip install --index-url https://test.pypi.org/simple/<cdl2r-version-pkg>
# upload
$ twine upload --repository pypi dist<cdl2r-version-pkg>
```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
File details
Details for the file cdl2r-0.1.2-cp36-cp36m-macosx_10_14_x86_64.whl
.
File metadata
- Download URL: cdl2r-0.1.2-cp36-cp36m-macosx_10_14_x86_64.whl
- Upload date:
- Size: 35.4 kB
- Tags: CPython 3.6m, macOS 10.14+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54ca7c0cab90838bb89a5344de16a41f3c6d760b591d07a1cc0ff51a2ce7e391 |
|
MD5 | e5b42973da349b38225fa569f30ae794 |
|
BLAKE2b-256 | 96be9c399c83d3667bd618d6f814911d2be229ca56f69ae7d77ea5ff9466d535 |