Skip to main content

PECOS - Predictions for Enormous and Correlated Output Spaces

Project description

PECOS - Predictions for Enormous and Correlated Output Spaces

PyPi Latest Release License

PECOS is a versatile and modular machine learning (ML) framework for fast learning and inference on problems with large output spaces, such as extreme multi-label ranking (XMR) and large-scale retrieval. PECOS' design is intentionally agnostic to the specific nature of the inputs and outputs as it is envisioned to be a general-purpose framework for multiple distinct applications.

Given an input, PECOS identifies a small set (10-100) of relevant outputs from amongst an extremely large (~100MM) candidate set and ranks these outputs in terms of relevance.

Features

Extreme Multi-label Ranking and Classification

  • X-Linear (pecos.xmc.xlinear): recursive linear models learning to traverse an input from the root of a hierarchical label tree to a few leaf node clusters, and return top-k relevant labels within the clusters as predictions. See more details in the PECOS paper (Yu et al., 2020).

    • fast real-time inference in C++
    • can handle 100MM output space
  • XR-Transformer (pecos.xmc.xtransformer): Transformer based XMC framework that fine-tunes pre-trained transformers recursively on multi-resolution objectives. It can be used to generate top-k relevant labels for a given instance or simply as a fine-tuning engine for task aware embeddings. See technical details in XR-Transformer paper (Zhang et al., 2021).

    • easy to extend with many pre-trained Transformer models from huggingface transformers.
    • establishes the State-of-the-art on public XMC benchmarks.
  • ANN Search with HNSW (pecos.ann.hnsw): a PECOS Approximated Nearest Neighbor (ANN) search module that implements the Hierarchical Navigable Small World Graphs (HNSW) algorithm (Malkov et al., TPAMI 2018).

    • Supports both sparse and dense input features
    • SIMD optimization for both dense/sparse distance computation
    • Supports thread-safe graph construction in parallel on multi-core shared memory machines
    • Supports thread-safe Searchers to do inference in parallel, which reduces inference overhead

Requirements and Installation

  • Python (3.9, 3.10, 3.11, 3.12)
  • Pip (>=19.3)

See other dependencies in setup.py You should install PECOS in a virtual environment. If you're unfamiliar with Python virtual environments, check out the user guide.

Supporting Platforms

  • Ubuntu 20.04 and 22.04
  • Amazon Linux 2

Installation from Wheel

PECOS can be installed using pip as follows:

python3 -m pip install libpecos

Installation from Source

Prerequisite builder tools

  • For Ubuntu (20.04, 22.04):
sudo apt-get update && sudo apt-get install -y build-essential git python3 python3-distutils python3-venv
  • For Amazon Linux 2:
sudo yum -y install python3 python3-devel python3-distutils python3-venv && sudo yum -y groupinstall 'Development Tools'

Install and develop locally

git clone https://github.com/amzn/pecos
cd pecos
python3 -m pip install --editable ./

Quick Tour

To have a glimpse of how PECOS works, here is a quick tour of using PECOS API for the XMR problem.

Toy Example

The eXtreme Multi-label Ranking (XMR) problem is defined by two matrices

Some toy data matrices are available in the tst-data folder.

PECOS constructs a hierarchical label tree and learns linear models recursively (e.g., XR-Linear):

>>> from pecos.xmc.xlinear.model import XLinearModel
>>> from pecos.xmc import Indexer, LabelEmbeddingFactory

# Build hierarchical label tree and train a XR-Linear model
>>> label_feat = LabelEmbeddingFactory.create(Y, X)
>>> cluster_chain = Indexer.gen(label_feat)
>>> model = XLinearModel.train(X, Y, C=cluster_chain)
>>> model.save("./save-models")

After learning the model, we do prediction and evaluation

>>> from pecos.utils import smat_util
>>> Yt_pred = model.predict(Xt)
# print precision and recall at k=10
>>> print(smat_util.Metrics.generate(Yt, Yt_pred))

PECOS also offers optimized C++ implementation for fast real-time inference

>>> model = XLinearModel.load("./save-models", is_predict_only=True)
>>> for i in range(X_tst.shape[0]):
>>>   y_tst_pred = model.predict(X_tst[i], threads=1)

Citation

If you find PECOS useful, please consider citing the following paper:

Some papers from PECOS team:

License

Copyright (2021) Amazon.com, Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

libpecos-1.2.7-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.0 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

libpecos-1.2.7-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.9 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

libpecos-1.2.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.0 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

libpecos-1.2.7-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.9 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

libpecos-1.2.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.0 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

libpecos-1.2.7-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.9 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

libpecos-1.2.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.0 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

libpecos-1.2.7-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.9 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

File details

Details for the file libpecos-1.2.7-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for libpecos-1.2.7-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f8e2e2389e02c64c5762273a6e2b0365a8ebc133f88965a400481362b7be9ac0
MD5 2060be3ff3661dda40c9d57cc4c89179
BLAKE2b-256 a864fba05803d7040203ed53dac881d52c27e1d748ea9754ed7d499de8c36a99

See more details on using hashes here.

File details

Details for the file libpecos-1.2.7-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for libpecos-1.2.7-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 8686d0b97cc936417af42120f151db659d672906e14489a49ab02d24bd9afccd
MD5 2153b06832fd09a6ad5e1df576ea2dd7
BLAKE2b-256 d5f208e44d2c3418feacc34502b2d4969df22e3e09b93248e1a5ec5b216d09a6

See more details on using hashes here.

File details

Details for the file libpecos-1.2.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for libpecos-1.2.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4d72358a7f3c853973e2a2d54d27a4afc7c4cbab83545bb44a031a062e582d41
MD5 d8bb1d0649104af3c2d06b7533bc5421
BLAKE2b-256 1bfde259976de9ce48c9d9a42b273a469101885b47c1a820a8c3c5d7c0dad1f3

See more details on using hashes here.

File details

Details for the file libpecos-1.2.7-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for libpecos-1.2.7-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 0b350e6cc1d0cbd140ae47dea8a1c8a07ced8075c984392afd0bb24106e556bb
MD5 848ccf820764444bf9dc0338eb259dff
BLAKE2b-256 ed91644f02d0709d97d1d7e485de465211831830e9f6e9baf0b9def116ce7902

See more details on using hashes here.

File details

Details for the file libpecos-1.2.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for libpecos-1.2.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3c791c237e0e355d50d9a6dca1c87534058ffa20ddbb48810eabf6402b908f9d
MD5 91329c1ef5bec5035c6e5940e16fdbe5
BLAKE2b-256 160a43e23d32deca5b16a5ba21baa902d5f1e7f22a8f60b965e090b980680de4

See more details on using hashes here.

File details

Details for the file libpecos-1.2.7-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for libpecos-1.2.7-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 f2bf31e42e32c70af8bc5cfa565c2da16c2f786de0a16d3920fb343b799dfb02
MD5 78c2f8f7da8698b9addc8318c8f9257b
BLAKE2b-256 e21c869f89b35b6653e1f85ed48adfe4ab16e92257d86f86e8b74803686eba43

See more details on using hashes here.

File details

Details for the file libpecos-1.2.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for libpecos-1.2.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1e1966d8b99d89c142f7227395444e9d4f76cbd18e3bf783a761bf89a8966cb0
MD5 5821b3745cd1b0d706fa395d2d1958f2
BLAKE2b-256 e1816046a448054024a19527ad42e9a32e19bc15680901c7dd9ae611e8730347

See more details on using hashes here.

File details

Details for the file libpecos-1.2.7-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for libpecos-1.2.7-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 8ccc78b8b15ce8a24d4940b21987f4d8a1762003a5bf04e3ece5d7392ac76578
MD5 a1b183f951633cfc6bad9b3580888f59
BLAKE2b-256 bd023564a44053ec8b7ac358e83ff72da1b2b3c144b5ce335572d7aa13ab4ade

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page