Skip to main content

Implementing my own skipgram model

Project description

py_skipgram_24

ci-cd PyPI version

📄 About

This package, named “py_skipgram_24”, is a comprehensive toolkit for Skip-gram modeling and evaluation. It offers a set of functions designed to facilitate various aspects of working with Skip-gram algorithms, from preprocessing the data, creating input pairs, training the model to getting word vectors. We aim to simplify the process by providing essential functionalities for data manipulation, model training, and evaluation.

📦 Functions

This package consists of six functions and explained as below:

  • SkipgramModel(vocab_size, embedding_dim): This class initializes the Skipgram model with the vocabulary size and embedding dimension, and defines the forward pass.
  • MyPreprocessor(texts, stopwords, strip_puncts=True): This class preprocesses the given texts by tokenizing the sentences, converting to lower case, and removing stopwords and punctuation.
  • create_input_pairs(pp_corpus, word2idx, context_size=2): This function creates input pairs for the Skipgram model from the preprocessed corpus, word-to-index mapping, and context size.
  • get_vocab(tokenized_corpus): This function gets the vocabulary from the tokenized corpus.
  • get_word_vectors(model, word2idx): This function gets the word vectors from the trained model and word-to-index mapping.

🛠️ Installation

Option 1 (For Users)

The package has been published to PYPI, we could use pip install

Create and activate a virtual environment using conda

$ conda create --name <env_name> pip -y
$ conda activate <env_name>

Install the package using the command below

$ pip install py_skipgram_24

Option 2 (For Developers)

To successfully run the following commands of installation, we would need conda and poetry, guide included in the link (conda, poetry)

Clone this repository

$ git clone git@github.com:<your_username>/py_skipgram_24.git

Direct to the root of this repository Create a virtual environment in Conda with Python by the following commands at terminal and activate it:

$ conda create --name py_skipgram_24 python=3.11 -y
$ conda activate py_skipgram_24

Install this package via poetry, run the following command.

$ poetry install

✅ Testing

To test this package, please run the following command from the root directory of the repository:

$ pytest tests/

Branch coverage could be viewed with the following command:

$ pytest --cov-branch --cov=py_skipgram_24

Usage

To successfully use our Skipgram model to predict the target, please first ensure you have followed the instruction of installation, and then run the following line in a python notebook. Or you can look at the doc folder, with an example notebook.

from py_skipgram_24 import SkipgramModel, create_input_pairs, get_vocab, MyPreprocessor, get_word_vectors
corpus = ["It was a great day. I loved the movie and spending time with you. I wish we had more time.", 
          "The sky is always blue underneath. Remember that."]
sentences = MyPreprocessor(corpus)
pp_corpus = list(sentences)
vocab = get_vocab(pp_corpus)
word2idx = {word: idx for idx, word in enumerate(vocab)}
idx_pairs = create_input_pairs(pp_corpus, word2idx, context_size=2)
model = SkipgramModel(len(vocab), 10)
train_model(model, idx_pairs, epochs=250, learning_rate=0.025)
word_vectors = get_word_vectors(model, word2idx)
print(word_vectors)

📚 Package Integration within the Python Ecosystem

py_skipgram_24, while acknowledging the robustness and the capabilities of PyTorch’s nn.Module, aims to offer a specialized and streamlined toolkit tailored explicitly for Skip-gram tasks. As a lightweight and focused alternative, py_skipgram_24 serves users who seek a concise package that offers preprocessing, creating input pairs, training the model, and getting word vectors functions. While PyTorch covers a broader spectrum of deep learning algorithms, py_skipgram_24 provides a more specialized package, potentially appealing to those who prefer a tailored implementation of their Skip-gram workflows.

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

py_skipgram_24 was created by Bill. It is licensed under the terms of the MIT license.

Credits

py_skipgram_24 was created with cookiecutter and the py-pkgs-cookiecutter template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_skipgram_24-0.1.1.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

py_skipgram_24-0.1.1-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file py_skipgram_24-0.1.1.tar.gz.

File metadata

  • Download URL: py_skipgram_24-0.1.1.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for py_skipgram_24-0.1.1.tar.gz
Algorithm Hash digest
SHA256 dcf99923d058954dcdbf0e2432c236ff587944bcc838ccf12e57366a4cfe2fbb
MD5 542efad2fc99c295a6f7b49b583acfb8
BLAKE2b-256 aff073240184a3d1cbaab9b9f9f45ddbb54b0f2323611970258f6bdbe1604561

See more details on using hashes here.

File details

Details for the file py_skipgram_24-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for py_skipgram_24-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 96f92dbec786ea2ef7943bae736ee6255e19f99d3a8479add7e9720018d39ee0
MD5 c185d018e33b498e10f0c77c8aa8ecb6
BLAKE2b-256 139656f523cae77aa50d234d3a73c17aa22d5f5c7a9ffc993626b0bc17c2e85b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page