Skip to main content

PECOS - Predictions for Enormous and Correlated Output Spaces

Project description

PECOS - Predictions for Enormous and Correlated Output Spaces

PyPi Latest Release License

PECOS is a versatile and modular machine learning (ML) framework for fast learning and inference on problems with large output spaces, such as extreme multi-label ranking (XMR) and large-scale retrieval. PECOS' design is intentionally agnostic to the specific nature of the inputs and outputs as it is envisioned to be a general-purpose framework for multiple distinct applications.

Given an input, PECOS identifies a small set (10-100) of relevant outputs from amongst an extremely large (~100MM) candidate set and ranks these outputs in terms of relevance.

Features

Extreme Multi-label Ranking and Classification

  • X-Linear (pecos.xmc.xlinear): recursive linear models learning to traverse an input from the root of a hierarchical label tree to a few leaf node clusters, and return top-k relevant labels within the clusters as predictions. See more details in the PECOS paper (Yu et al., 2020).

    • fast real-time inference in C++
    • can handle 100MM output space
  • XR-Transformer (pecos.xmc.xtransformer): Transformer based XMC framework that fine-tunes pre-trained transformers recursively on multi-resolution objectives. It can be used to generate top-k relevant labels for a given instance or simply as a fine-tuning engine for task aware embeddings. See technical details in XR-Transformer paper (Zhang et al., 2021).

    • easy to extend with many pre-trained Transformer models from huggingface transformers.
    • establishes the State-of-the-art on public XMC benchmarks.
  • ANN Search with HNSW (pecos.ann.hnsw): a PECOS Approximated Nearest Neighbor (ANN) search module that implements the Hierarchical Navigable Small World Graphs (HNSW) algorithm (Malkov et al., TPAMI 2018).

    • Supports both sparse and dense input features
    • SIMD optimization for both dense/sparse distance computation
    • Supports thread-safe graph construction in parallel on multi-core shared memory machines
    • Supports thread-safe Searchers to do inference in parallel, which reduces inference overhead

Requirements and Installation

  • Python (3.9, 3.10, 3.11, 3.12)
  • Pip (>=19.3)

See other dependencies in setup.py You should install PECOS in a virtual environment. If you're unfamiliar with Python virtual environments, check out the user guide.

Supporting Platforms

  • Ubuntu 20.04 and 22.04
  • Amazon Linux 2

Installation from Wheel

PECOS can be installed using pip as follows:

python3 -m pip install libpecos

Installation from Source

Prerequisite builder tools

  • For Ubuntu (20.04, 22.04):
sudo apt-get update && sudo apt-get install -y build-essential git python3 python3-distutils python3-venv
  • For Amazon Linux 2:
sudo yum -y install python3 python3-devel python3-distutils python3-venv && sudo yum -y groupinstall 'Development Tools'

Install and develop locally

git clone https://github.com/amzn/pecos
cd pecos
python3 -m pip install --editable ./

Quick Tour

To have a glimpse of how PECOS works, here is a quick tour of using PECOS API for the XMR problem.

Toy Example

The eXtreme Multi-label Ranking (XMR) problem is defined by two matrices

Some toy data matrices are available in the tst-data folder.

PECOS constructs a hierarchical label tree and learns linear models recursively (e.g., XR-Linear):

>>> from pecos.xmc.xlinear.model import XLinearModel
>>> from pecos.xmc import Indexer, LabelEmbeddingFactory

# Build hierarchical label tree and train a XR-Linear model
>>> label_feat = LabelEmbeddingFactory.create(Y, X)
>>> cluster_chain = Indexer.gen(label_feat)
>>> model = XLinearModel.train(X, Y, C=cluster_chain)
>>> model.save("./save-models")

After learning the model, we do prediction and evaluation

>>> from pecos.utils import smat_util
>>> Yt_pred = model.predict(Xt)
# print precision and recall at k=10
>>> print(smat_util.Metrics.generate(Yt, Yt_pred))

PECOS also offers optimized C++ implementation for fast real-time inference

>>> model = XLinearModel.load("./save-models", is_predict_only=True)
>>> for i in range(X_tst.shape[0]):
>>>   y_tst_pred = model.predict(X_tst[i], threads=1)

Citation

If you find PECOS useful, please consider citing the following paper:

Some papers from PECOS team:

License

Copyright (2021) Amazon.com, Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

libpecos-1.2.8-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

libpecos-1.2.8-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

libpecos-1.2.8-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

libpecos-1.2.8-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

libpecos-1.2.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.0 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

libpecos-1.2.8-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64

libpecos-1.2.8-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.0 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

libpecos-1.2.8-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.9 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ ARM64

File details

Details for the file libpecos-1.2.8-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for libpecos-1.2.8-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b6bc75e0405f7d9f2f9055f450da01fc8cd6f88b3dc608f75010851463c1c0a3
MD5 190fd3d89440bd95222a9897ee7044f4
BLAKE2b-256 372f5030f0a4b9b08c0ebfc53d1a2c70bcd755aa274c8860690b7a5561e7244d

See more details on using hashes here.

File details

Details for the file libpecos-1.2.8-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for libpecos-1.2.8-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 4c98e8e50ee1676bbd13d76f91598913a980b6f110ab6b8a4dab404d58c85022
MD5 bf0b266fd6bd0c0ce751590eee112d1c
BLAKE2b-256 216b49dba339a60dbd1e3e3a54ef978ea4f419191c01a33691b43858fec8f16a

See more details on using hashes here.

File details

Details for the file libpecos-1.2.8-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for libpecos-1.2.8-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c7b5233f32fe9d550057e23f805f9c16b733f158edcac4956ccc41d9d3230829
MD5 45b4c625d26cf012a9b73e51119eecc2
BLAKE2b-256 0354e171b79103951b5c79fb40128e78e9df77a4e13bcb4fb7a1c8a90125cdfa

See more details on using hashes here.

File details

Details for the file libpecos-1.2.8-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for libpecos-1.2.8-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 68093f50e794ae2c42392c7fd60b088359c105061b72ddff3dc40f7427ec549b
MD5 2939ff587a31e57902540dfce6a9e57e
BLAKE2b-256 a2c06e4692847f5df05d91e49051d9b05fb8cccc1832ba2d99396a8de18af8ec

See more details on using hashes here.

File details

Details for the file libpecos-1.2.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for libpecos-1.2.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 617fe71c689ff1a6efe49f5d9d6b62d7da5a0076a2a6cceeae186e82ed27a265
MD5 470ce41711bba35a6f58177f3dffb73d
BLAKE2b-256 bbcc9325dc5f3393684a320bc2a02b55441942102fa2901634b4a2079123f7d6

See more details on using hashes here.

File details

Details for the file libpecos-1.2.8-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for libpecos-1.2.8-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 c32e972f6d5197a1f7c72e9630ac1541655dd753c9cba666771b2731b331ce63
MD5 62f0c17b8741f07245633b9b945c1036
BLAKE2b-256 0b8a4985bece8a8d0a103d22c3d39fc06ea449cf7f43e8e62d21385255cad191

See more details on using hashes here.

File details

Details for the file libpecos-1.2.8-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for libpecos-1.2.8-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 709ff8b499a5989a1c087cb948aed206e1ad608a1062438e869d019bb19f49d3
MD5 ec4543e1d0588782e3516c9bdb5e7c1d
BLAKE2b-256 6634cd0b3b8f115e3b04a58f8bb18685021f2190a4184419d0efa75188bb7452

See more details on using hashes here.

File details

Details for the file libpecos-1.2.8-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for libpecos-1.2.8-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 2636528e089d711acd0826f0875c669efd1a98bb0568a0393ca8307593e7f050
MD5 74196dd730d1fdee4b9d12c512dff0aa
BLAKE2b-256 fdfb4a757bb4d0b8358378e45378689293709b7b2c34173b7d4d77ad7e04443a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page