Skip to main content

Value-based word embeddings that incorporate external continuous values.

Project description

ValueVec MIT License PyPI Python

ValueVec is a framework for learning word embeddings driven by external continuous values, such as similarity labels based on behavior, attributes, or measurements. Unlike traditional word2vec models that rely solely on linguistic context, ValueVec uses numeric supervision to capture more targeted relationships between terms.


Architecture Overview

ValueVec supports two training paradigms:

Model Description Use Case
manual_model/ Custom update logic based on cosine gradient approximations For learning & debugging
nn_model/ PyTorch-based training using nn.Embedding + MSE loss For real-world applications

Detailed explanation available in docs/architecture.md


Key Features

  • Continuous Supervision: Uses numeric similarity scores between words.
  • Cosine-Based Optimization: Directly optimizes cosine similarity between embeddings.
  • Manual + Neural Versions: Choose between interpretability or performance.
  • Custom Datasets: Generate value-supervised datasets from colors, fruits, animals, etc.
  • Visualizable: Easily inspect the embedding space with built-in PCA projection.

Installation

# Option 1: From PyPI
pip install valuevec

# Option 2: From source
git clone https://github.com/rdoku/valuevec.git
cd valuevec
pip install -e .

Quick Start

# Use an example script to train a value-driven embedding model
python examples/basic_usage.py

For custom training data, see docs/usage.md.

Example Applications

  • E-commerce – Group keywords with similar price influence
  • Finance – Cluster terms by correlation with financial metrics
  • Customer Modeling – Link descriptors to user value or conversion likelihood
  • Sentiment Analysis – Model emotional intensity beyond polarity

Project Layout

valuevec/
├── manual_model/    # Manual gradient updates
├── nn_model/        # PyTorch-based implementation
├── training_data/   # Data generation utilities
├── examples/        # Ready-to-run training and analysis
├── tests/           # Unit tests
├── docs/            # Markdown documentation

Documentation

  • docs/architecture.md – Neural vs. manual training
  • docs/usage.md – Training, inference, visualization
  • docs/CONTRIBUTING.md – Guidelines for contributing

Contributing

We welcome contributions! Get started with:

git checkout -b feature/your-feature

Then open a Pull Request. For details, see docs/CONTRIBUTING.md.

License

MIT License. See the LICENSE file for details.

Citation

If you use ValueVec in your work, please cite it as:

@software{valuevec2025,
  author = {Ronald Doku},
  title = {ValueVec: Value-Driven Word Embeddings},
  year = {2025},
  url = {https://github.com/rdoku/valuevec}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

valuevec-0.1.0.tar.gz (22.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

valuevec-0.1.0-py3-none-any.whl (25.6 kB view details)

Uploaded Python 3

File details

Details for the file valuevec-0.1.0.tar.gz.

File metadata

  • Download URL: valuevec-0.1.0.tar.gz
  • Upload date:
  • Size: 22.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.12

File hashes

Hashes for valuevec-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f50001907f112ead8883ccb2ae199ed9d6564ba5991120b9a68101a30f271336
MD5 77f9630bf1d55ccddc5cbc41515637be
BLAKE2b-256 f15ee1f1e4c2ce25918e43cfec8621b8715a7a359f00723e004f48aa229cc8ea

See more details on using hashes here.

File details

Details for the file valuevec-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: valuevec-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 25.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.12

File hashes

Hashes for valuevec-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 162d3c26c983af048374b81ec0d924ecd48a17f46d75636ae2bc5d97934f90e6
MD5 4208892bf4754cc7e5d36fb8dcc6006c
BLAKE2b-256 bda19395bcac601a2bc5ff2c809b776b211cd68258b51859eff344131e7fd511

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page