A package for training and inference of the InterFusion Encoder model

Project description

InterFusion Encoder

InterFusion Encoder is a Python package for training and inference of a cross-encoder model designed to match Users with movies using both textual data and optional sparse features. It utilizes state-of-the-art transformer models and incorporates an attention mechanism and interaction layers to enhance performance.

Features
Installation
Usage
- Training
- Inference
Data Preparation
Configuration
Contributing
License

Features

Supports user and movie features of different lengths.
Incorporates both bi-encoder and cross-encoder architectures.
Utilizes hard negative sampling and random negatives for robust training.
Includes attention mechanisms and interaction layers for improved performance.
Supports training continuation from saved checkpoints.
Integrated with Weights & Biases (W&B) for experiment tracking.

Installation

Install the package using pip:

pip install interfusion_encoder

Usage

Training

from interfusion import train_model

# Prepare your data
users = [
    {
        "user_id": "user_001",
        "user_text": "Avid movie enthusiast with a passion for indie films...",
        "user_features": [0.8, 0.7, 0.9]
    },
    # Add more users
]

movies = [
    {
        "movie_id": "movie_001",
        "movie_text": "An engaging drama exploring human relationships...",
        "movie_features": [0.85, 0.75, 0.9, 0.95]
    },
    # Add more movies
]

positive_matches = [
    {
        "user_id": "user_001",
        "movie_id": "movie_001"
    },
    # Add more positive matches
]

# Define your configuration (optional)
user_config = {
    'use_sparse': True,
    'num_epochs': 5,
    'learning_rate': 3e-5,
    'cross_encoder_model_name': 'bert-base-uncased',
    'bi_encoder_model_name': 'bert-base-uncased',
    'wandb_project': 'interfusion_project',
    'wandb_run_name': 'experiment_1',
    # Add or override other configurations as needed
}

# Start training
train_model(users, movies, positive_matches, user_config=user_config)

Inference

from interfusion import InterFusionInference

# Initialize inference model
config = {
    'use_sparse': True,
    'cross_encoder_model_name': 'bert-base-uncased',
    'saved_model_path': 'saved_models/interfusion_final.pt',
    'user_feature_size': 3,  # Set according to your data
    'movie_feature_size': 4  # Set according to your data
}
inference_model = InterFusionInference(config=config)

# Prepare user and movie texts and features
user_texts = [
    "Avid movie enthusiast with a passion for indie films...",
    # Add more user texts
]

movie_texts = [
    "An engaging drama exploring human relationships...",
    # Add more movie texts
]

user_features_list = [
    [0.8, 0.7, 0.9],
    # Add more user features
]

movie_features_list = [
    [0.85, 0.75, 0.9, 0.95],
    # Add more movie features
]

# Predict match scores
scores = inference_model.predict(user_texts, movie_texts, user_features_list, movie_features_list)

# Print the results
for user, movie, score in zip(user_texts, movie_texts, scores):
    print(f"User: {user}")
    print(f"Movie: {movie}")
    print(f"Match Score: {score:.4f}\n")

Data Preparation

Ensure your data is in the form of lists of dictionaries with the following structure:

Users:

[
  {
    "user_id": "user_001",
    "user_text": "Avid movie enthusiast with a passion for indie films and a deep knowledge of film history.",
    "user_features": [0.8, 0.7, 0.9]
  },
  {
    "user_id": "user_002",
    "user_text": "Film critic with a focus on evaluating cinematic techniques and storytelling.",
    "user_features": [0.9, 0.6, 0.85]
  },
  {
    "user_id": "user_003",
    "user_text": "Casual viewer with a love for comedies and light-hearted movies.",
    "user_features": [0.7, 0.8, 0.75]
  }
]

Movies:

[
  {
    "movie_id": "movie_001",
    "movie_text": "An engaging drama exploring complex human emotions and relationships.",
    "movie_features": [0.85, 0.75, 0.9]
  },
  {
    "movie_id": "movie_002",
    "movie_text": "A thought-provoking documentary that delves into social issues with nuance.",
    "movie_features": [0.9, 0.65, 0.8]
  },
  {
    "movie_id": "movie_003",
    "movie_text": "A light-hearted comedy perfect for a relaxed evening with friends.",
    "movie_features": [0.7, 0.85, 0.8]
  }
]

Positive Matches:

[
  {
    "user_id": "user_001",
    "movie_id": "movie_001"
  },
  {
    "user_id": "user_002",
    "movie_id": "movie_002"
  },
  {
    "user_id": "user_003",
    "movie_id": "movie_003"
  }
]

Configuration

You can customize the model and training parameters by passing a user_config dictionary to the train_model function. Here are some of the configurable parameters:

random_seed: Random seed for reproducibility.
max_length: Maximum sequence length for tokenization.
use_sparse: Whether to use sparse features.
bi_encoder_model_name: Pre-trained model name for the bi-encoder.
cross_encoder_model_name: Pre-trained model name for the cross-encoder.
learning_rate: Learning rate for the optimizer.
num_epochs: Number of training epochs.
train_batch_size: Batch size for training.
wandb_project: W&B project name for logging.
saved_model_path: Path to save or load the trained model.

Contributing

Contributions are welcome! Please open an issue or submit a pull request on GitHub.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

Release history Release notifications | RSS feed

3.1

Jun 15, 2025

3.0.2

Mar 11, 2025

3.0.1

Mar 10, 2025

3.0.0

Mar 10, 2025

2.0.13

Mar 10, 2025

2.0.12

Mar 10, 2025

2.0.11

Mar 3, 2025

2.0.10

Feb 28, 2025

2.0.9

Feb 28, 2025

2.0.8

Feb 27, 2025

2.0.7

Feb 27, 2025

2.0.6

Feb 26, 2025

This version

2.0.3

Feb 20, 2025

2.0.2

Feb 4, 2025

2.0.1

Jan 29, 2025

2.0.0

Jan 28, 2025

1.0.22

Nov 15, 2024

1.0.21

Nov 11, 2024

1.0.20

Nov 10, 2024

1.0.19

Nov 2, 2024

1.0.18

Nov 2, 2024

1.0.17

Nov 2, 2024

1.0.16

Nov 2, 2024

1.0.15

Nov 2, 2024

1.0.14

Nov 2, 2024

1.0.13

Nov 2, 2024

1.0.12

Nov 2, 2024

1.0.11

Nov 2, 2024

1.0.10

Nov 2, 2024

1.0.9

Nov 2, 2024

1.0.8

Nov 2, 2024

1.0.7

Nov 2, 2024

1.0.6

Nov 2, 2024

1.0.5

Nov 2, 2024

1.0.4

Nov 1, 2024

1.0.3

Nov 1, 2024

1.0.2

Nov 1, 2024

1.0.1

Oct 29, 2024

1.0.0

Oct 29, 2024

0.3.9

Oct 29, 2024

0.3.8

Oct 29, 2024

0.3.7

Oct 29, 2024

0.3.6

Oct 29, 2024

0.3.5

Oct 27, 2024

0.3.4

Oct 27, 2024

0.3.3

Oct 27, 2024

0.2.0

Oct 26, 2024

0.1.0

Oct 26, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

interfusion_encoder-2.0.3.tar.gz (24.1 kB view details)

Uploaded Feb 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

interfusion_encoder-2.0.3-py3-none-any.whl (24.3 kB view details)

Uploaded Feb 20, 2025 Python 3

File details

Details for the file interfusion_encoder-2.0.3.tar.gz.

File metadata

Download URL: interfusion_encoder-2.0.3.tar.gz
Upload date: Feb 20, 2025
Size: 24.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for interfusion_encoder-2.0.3.tar.gz
Algorithm	Hash digest
SHA256	`9ed0f82ca33c4394fda9449211eae644c24c950835a0e4569ba9f01ff1e2b479`
MD5	`fd5e8c3cfe8a80c5c8d4c6bd2b03cf2e`
BLAKE2b-256	`9856c654da64248a471db73baff2b8171ce400bb3e3ae8e6f595cee7848a01c9`

See more details on using hashes here.

File details

Details for the file interfusion_encoder-2.0.3-py3-none-any.whl.

File metadata

Download URL: interfusion_encoder-2.0.3-py3-none-any.whl
Upload date: Feb 20, 2025
Size: 24.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for interfusion_encoder-2.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`816734a71fcceaf8f52d7f98cf3c94c54fb157da665bf61b34c871425e71540b`
MD5	`862dae4e0769f42a73ccabc785211a2d`
BLAKE2b-256	`4acfb2fbd111485d54a5b229aabe43f446010773c927d88b91a9e97aa5795536`

See more details on using hashes here.

interfusion-encoder 2.0.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

InterFusion Encoder

Table of Contents

Features

Installation

Usage

Training

Inference

Data Preparation

Configuration

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes