Skip to main content

Access and operations with word2vec data

Project description

idiom

Access and operations with word2vec data

To install: pip install idiom

Overview

The idiom package provides access to word vector data and useful functions to manipulate and analyze it. It includes functionalities for finding the closest words to a given word, calculating word frequencies, and working with various word vector models.

Features

  • Closest Words: Find the closest words to a given word based on cosine similarity.
  • Word Frequencies: Access and manipulate word frequency data.
  • Word Vector Models: Work with pre-trained word vector models such as FastText.
  • IDF Calculations: Compute different types of Inverse Document Frequency (IDF) values.

Usage

Finding Closest Words

You can find the closest words to a given word using the closest_words function:

from idiom import closest_words

# Example: Find the closest words to 'mad' that start with 'l'
starts_with_L = lambda x: x.startswith('l')
print(closest_words('mad', k=10, search_words=starts_with_L))

Accessing Word Frequencies

You can access the most frequent words using the most_frequent_words function:

from idiom import most_frequent_words

# Get the top 100,000 most frequent words
frequent_words = most_frequent_words(max_n_words=100000)
print(frequent_words)

Working with Word Vectors

You can load and work with pre-trained word vectors using the WordVec class:

from idiom import WordVec

# Initialize WordVec with default word vectors
word_vec = WordVec()

# Calculate the distance between two queries
distance = word_vec.dist('france capital', 'paris')
print(distance)

IDF Calculations

You can compute different types of IDF values using the _IDF class:

from idiom import idf

# Access logarithmic IDF values
log_idf = idf.logarithmic
print(log_idf)

Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue on GitHub.

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

idiom-0.1.6.tar.gz (2.3 MB view details)

Uploaded Source

Built Distribution

idiom-0.1.6-py3-none-any.whl (2.2 MB view details)

Uploaded Python 3

File details

Details for the file idiom-0.1.6.tar.gz.

File metadata

  • Download URL: idiom-0.1.6.tar.gz
  • Upload date:
  • Size: 2.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for idiom-0.1.6.tar.gz
Algorithm Hash digest
SHA256 572c0dbb2a082f4957e2152a4598858e7b4ae8851c09049a80f77ef86564240f
MD5 16158c33c90b657dce7fd4924ac5308c
BLAKE2b-256 bd3144428393183593fb08c930066c4bd9a6d27d604f3c71d4ded8d29cf5730b

See more details on using hashes here.

File details

Details for the file idiom-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: idiom-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 2.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for idiom-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 0563cba757fe52f2f7ee9478b4a0a85571b2a191caee84ac04a0dd46ccdb44fe
MD5 daba44cf8bcab5d8382583ff9743a0d2
BLAKE2b-256 c5061673e34edc7a19c2a3cc3678ea48da75963d3b03375f860ab8fb8e2ed6c2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page