Skip to main content

Language of Vectors (LangVec) is a simple Python library designed for transforming numerical vector data into a language-like structure using a predefined set of words (lexicon).

Project description

LangVec Logo

Language of Vectors (LangVec) is a simple Python library designed for transforming numerical vector data into a language-like structure using a predefined set of words (lexicon).

Approach

LangVec package leverages the concept of percentile-based mapping to assign words from a lexicon to numerical values, facilitating intuitive and human-readable representations of numerical data.

LangVec Simplified schema Simplified schema of how LangVec works

Where to use LangVec

The main application is in semantic search and similarity-based systems, where understanding the proximity between vectors is crucial.
By transforming complex numerical vectors into a lexicon-based representation, LangVec facilitates an intuitive understanding of these similarities for humans.

In fields like machine learning and natural language processing, LangVec can assist in tasks such as clustering or categorizing data, where a human-readable format is preferable for quick insights and decision-making.

Installation

pip install langvec

Usage

Example 1

import numpy as np

from langvec import LangVec

# Random seed
np.random.seed(42)

# Initialize LangVec
lv = LangVec()
NUM_VECTORS = 1000
DIMENSIONS = 10

# Generate some random data
vectors = [np.random.uniform(0, 1, DIMENSIONS) for _ in range(NUM_VECTORS)]

# Fit to this data (getting know to distribution)
lv.fit(vectors)

# Save current model
lv.save("model.lv")

# Example vector for prediction
input_vector = np.random.uniform(0, 1, DIMENSIONS)

# Make prediction on unseen vector embedding
print(lv.predict(input_vector))

Example 2

import string

import numpy as np

from langvec import LangVec

np.random.seed(42)

# Define a new lexicon with lowercase and uppercase letters
LEXICON = list(string.ascii_letters)

# Initialize LangVec with the new lexicon
lv = LangVec(lexicon=LEXICON)

NUM_VECTORS = 10000
DIMENSIONS = 256

# Generate some random data
vectors = [np.random.uniform(0, 1, DIMENSIONS) for _ in range(NUM_VECTORS)]

# Fit to this data
lv.fit(vectors)

# Example vector for prediction
input_vector = np.random.uniform(0, 1, DIMENSIONS)

# Make prediction on the unseen vector embedding
predicted_string = "".join(lv.predict(input_vector))
print(predicted_string)
if len(predicted_string) > 6:
    summarized_string = (
        "".join(predicted_string[:3]) + "..." + "".join(predicted_string[-3:])
    )
else:
    summarized_string = "".join(predicted_string)

print(summarized_string)

Save and load model from disk

LangVec allows you to save and load percentiles as model artifacts. This is useful for preserving the learned distribution without needing to retrain the model. You can use the following methods:

Save model

from langvec import LangVec

# Initialize LangVec
lv = LangVec()

# Save the model to file
lv.save("model.lv")

Load model

from langvec import LangVec

# Initialize LangVec
lv = LangVec()

# Load the model from file
lv.load("model.lv")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langvec-0.0.2.tar.gz (7.1 kB view details)

Uploaded Source

File details

Details for the file langvec-0.0.2.tar.gz.

File metadata

  • Download URL: langvec-0.0.2.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.19

File hashes

Hashes for langvec-0.0.2.tar.gz
Algorithm Hash digest
SHA256 bbb34ea7c4d8a0944d1945ada4d97be9a27a885ad3dd2125817b34e1092300e5
MD5 ee61dc821674ba73f852f4e4472a8720
BLAKE2b-256 7c16c71c0c2d11c85f52414cfb639dcc761aad0971b12770e913cc399854d58f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page