Access and operations with word2vec data
Project description
idiom
Access and operations with word2vec data
To install: pip install idiom
Overview
The idiom
package provides access to word vector data and useful functions to manipulate and analyze it. It includes functionalities for finding the closest words to a given word, calculating word frequencies, and working with various word vector models.
Features
- Closest Words: Find the closest words to a given word based on cosine similarity.
- Word Frequencies: Access and manipulate word frequency data.
- Word Vector Models: Work with pre-trained word vector models such as FastText.
- IDF Calculations: Compute different types of Inverse Document Frequency (IDF) values.
Usage
Finding Closest Words
You can find the closest words to a given word using the closest_words
function:
from idiom import closest_words
# Example: Find the closest words to 'mad' that start with 'l'
starts_with_L = lambda x: x.startswith('l')
print(closest_words('mad', k=10, search_words=starts_with_L))
Accessing Word Frequencies
You can access the most frequent words using the most_frequent_words
function:
from idiom import most_frequent_words
# Get the top 100,000 most frequent words
frequent_words = most_frequent_words(max_n_words=100000)
print(frequent_words)
Working with Word Vectors
You can load and work with pre-trained word vectors using the WordVec
class:
from idiom import WordVec
# Initialize WordVec with default word vectors
word_vec = WordVec()
# Calculate the distance between two queries
distance = word_vec.dist('france capital', 'paris')
print(distance)
IDF Calculations
You can compute different types of IDF values using the _IDF
class:
from idiom import idf
# Access logarithmic IDF values
log_idf = idf.logarithmic
print(log_idf)
Contributing
Contributions are welcome! Please feel free to submit a pull request or open an issue on GitHub.
License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file idiom-0.1.6.tar.gz
.
File metadata
- Download URL: idiom-0.1.6.tar.gz
- Upload date:
- Size: 2.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
572c0dbb2a082f4957e2152a4598858e7b4ae8851c09049a80f77ef86564240f
|
|
MD5 |
16158c33c90b657dce7fd4924ac5308c
|
|
BLAKE2b-256 |
bd3144428393183593fb08c930066c4bd9a6d27d604f3c71d4ded8d29cf5730b
|
File details
Details for the file idiom-0.1.6-py3-none-any.whl
.
File metadata
- Download URL: idiom-0.1.6-py3-none-any.whl
- Upload date:
- Size: 2.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
0563cba757fe52f2f7ee9478b4a0a85571b2a191caee84ac04a0dd46ccdb44fe
|
|
MD5 |
daba44cf8bcab5d8382583ff9743a0d2
|
|
BLAKE2b-256 |
c5061673e34edc7a19c2a3cc3678ea48da75963d3b03375f860ab8fb8e2ed6c2
|