Machine Learning libraries for Information Retrieval

These details have been verified by PyPI

Maintainers

darshs gbalikas jake.mannix lastmansleeping mzahran salesforce Ullimague

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

ml4ir Python Quickstart

For more detailed usage documentation check ml4ir.readthedocs.io

Installation
Usage
Running Tests

Installation

Using ml4ir as a library

Requirements

python3.{6,7} (tf2.0.3 is not available for python3.8)
pip3

ml4ir can be installed as a pip package by using the following command

pip3 install ml4ir

This will install ml4ir-0.1.3 (the current version) from PyPI.

To install optional dependencies like pygraphviz, use the following command:

pip3 install ml4ir[visualization]

To use pre-built pipelines that come with ml4ir, make sure to install it as follows (this installs pyspark and pygraphviz as well)

pip install ml4ir[all]

Using ml4ir as a toolkit or contributing to ml4ir

Firstly, clone ml4ir

git clone https://github.com/salesforce/ml4ir

You can use and develop on ml4ir either using docker or virtualenv

Docker (Recommended)

Requirements

docker (18.09+ tested)
docker-compose

We have set up a docker-compose.yml file for building and using docker containers to train models.

Change the working directory to the python package

cd path/to/ml4ir/python/

To build the docker image and run unit tests

docker-compose up --build

To only build the ml4ir docker image without running tests

docker-compose build

Virtual Environment

Requirements

python3.{6,7} (tf2.0.3 is not available for python3.8)
pip3

Change the working directory to the python package

cd path/to/ml4ir/python/

Install virtualenv

pip3 install virtualenv

Create new python3 virtual environment inside your git repo (it's .gitignored, don't worry)

python3 -m venv env/.ml4ir_venv3

Activate virtualenv

source env/.ml4ir_venv3/bin/activate

Install all dependencies

pip3 install --upgrade setuptools
pip install --upgrade pip
pip3 install -r requirements.txt

Set the PYTHONPATH environment variable to point to the python package

export PYTHONPATH=$PYTHONPATH:`pwd`

Contributing to ml4ir

Install python dependencies from the requirements.txt and dev-requirements.txt to setup the dependencies required for pre-commit hooks.
pre-commit-hooks are required, and installed as a requirement for contributing to ml4ir. If an error results that they didn't install, execute pre-commit install to install git hooks in your .git/ directory.

Usage

ml4ir as a toolkit

The entrypoint into the training or evaluation functionality of ml4ir is through ml4ir/base/pipeline.py and for application specific overrides, look at `ml4ir/applications/<eg: ranking>/pipeline.py

Pipelines currently supported:

ml4ir/applications/ranking/pipeline.py
ml4ir/applications/classification/pipeline.py

To run the ml4ir ranking pipeline to train, evaluate and/or test, use

docker-compose run ml4ir \
    python3 ml4ir/applications/ranking/pipeline.py \
    <args>

An example ranking training predict and evaluate pipeline

docker-compose run ml4ir \
	python3 ml4ir/applications/ranking/pipeline.py \
	--data_dir ml4ir/applications/ranking/tests/data/tfrecord \
	--feature_config ml4ir/applications/ranking/tests/data/configs/feature_config.yaml \
	--run_id test \
	--data_format tfrecord \
	--execution_mode train_inference_evaluate

For more examples of usage, check:

ml4ir as a library

To use ml4ir as a deep learning library to build relevance models, look at the following walkthroughs under notebooks/

Learning to Rank : The PointwiseRankingDemo notebook walks you through building, training, saving, and the entire life cycle of a RelevanceModel from the bottom up. You can also find details regarding the architecture of ml4ir in it.
Text Classification : The EntityPredictionDemo notebook walks you through training a model to predict entity type given a user context and query.
Ranking Explanations : The Ranking_Explanations notebook walks you through per-query explanations for a trained ml4ir model

Enter the following command to spin up Jupyter notebook on your browser to run the above notebooks

cd path/to/ml4ir/python/
source env/.ml4ir_venv3/bin/activate
pip3 install notebook
jupyter-notebook

Running Tests

To run all the python based tests under ml4ir

Using docker

docker-compose up

Using virtualenv

python3 -m pytest

To run specific tests,

python3 -m pytest /path/to/test/module

Build

We are using CircleCi for the build process. For code coverage for python, we are using coverage Python coverage scores for each PR are calculated by the build and are available in the "Artifacts" section of the build_test_coverage job.

Project details

These details have been verified by PyPI

Maintainers

darshs gbalikas jake.mannix lastmansleeping mzahran salesforce Ullimague

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.1.16

Mar 7, 2023

0.1.15

Feb 6, 2023

0.1.14

Nov 21, 2022

0.1.13

Oct 18, 2022

0.1.12

Apr 26, 2022

0.1.11

Jan 21, 2022

0.1.10

Dec 30, 2021

0.1.9

Dec 29, 2021

0.1.8

Oct 22, 2021

0.1.6

Jul 16, 2021

0.1.4

Jul 1, 2021

0.1.3

Jun 24, 2021

0.1.2

Jun 17, 2021

0.1.0

Mar 4, 2021

0.0.5

Feb 17, 2021

0.0.4

Feb 17, 2021

0.0.3

Oct 8, 2020

0.0.2

Sep 23, 2020

0.0.1

Jun 17, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml4ir-0.1.16.tar.gz (4.8 MB view hashes)

Uploaded Mar 7, 2023 Source

Built Distribution

ml4ir-0.1.16-py3-none-any.whl (4.9 MB view hashes)

Uploaded Mar 7, 2023 Python 3

Hashes for ml4ir-0.1.16.tar.gz

Hashes for ml4ir-0.1.16.tar.gz
Algorithm	Hash digest
SHA256	`bc73a045baa74be7fd7a8c2bb4a050e83403160496c1bc12ffd855802addea1b`
MD5	`280392c0ac88740b384790c71bc7baf0`
BLAKE2b-256	`663c52a55f7dd871076c26560135a86f164868c1677fbcb3406ce15d7b67c0d9`

Hashes for ml4ir-0.1.16-py3-none-any.whl

Hashes for ml4ir-0.1.16-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6f07ef846e3d1e23533e7023e35928a0b15612965d86d01c0a7e87388b429875`
MD5	`4a296054c70bcc1a0577b492f53fc091`
BLAKE2b-256	`fcb7b3bee647b97668c2ffcc9112fbc7cf5d01e6f6372b5eac038b0822596f34`