Skip to main content

Implementing my own skipgram model

Project description

py_skipgram_24

ci-cd PyPI version

📄 About

This package, named “py_skipgram_24”, is a comprehensive toolkit for Skip-gram modeling and evaluation. It offers a set of functions designed to facilitate various aspects of working with Skip-gram algorithms, from preprocessing the data, creating input pairs, training the model to getting word vectors. We aim to simplify the process by providing essential functionalities for data manipulation, model training, and evaluation.

📦 Functions

This package consists of six functions and explained as below:

  • SkipgramModel(vocab_size, embedding_dim): This class initializes the Skipgram model with the vocabulary size and embedding dimension, and defines the forward pass.
  • MyPreprocessor(texts, stopwords, strip_puncts=True): This class preprocesses the given texts by tokenizing the sentences, converting to lower case, and removing stopwords and punctuation.
  • create_input_pairs(pp_corpus, word2idx, context_size=2): This function creates input pairs for the Skipgram model from the preprocessed corpus, word-to-index mapping, and context size.
  • get_vocab(tokenized_corpus): This function gets the vocabulary from the tokenized corpus.
  • get_word_vectors(model, word2idx): This function gets the word vectors from the trained model and word-to-index mapping.

🛠️ Installation

Option 1 (For Users)

The package has been published to PYPI, we could use pip install

Create and activate a virtual environment using conda

$ conda create --name <env_name> pip -y
$ conda activate <env_name>

Install the package using the command below

$ pip install py_skipgram_24

Option 2 (For Developers)

To successfully run the following commands of installation, we would need conda and poetry, guide included in the link (conda, poetry)

Clone this repository

$ git clone git@github.com:<your_username>/py_skipgram_24.git

Direct to the root of this repository Create a virtual environment in Conda with Python by the following commands at terminal and activate it:

$ conda create --name py_skipgram_24 python=3.11 -y
$ conda activate py_skipgram_24

Install this package via poetry, run the following command.

$ poetry install

✅ Testing

To test this package, please run the following command from the root directory of the repository:

$ pytest tests/

Branch coverage could be viewed with the following command:

$ pytest --cov-branch --cov=py_skipgram_24

Usage

To successfully use our Skipgram model to predict the target, please first ensure you have followed the instruction of installation, and then run the following line in a python notebook. Or you can look at the doc folder, with an example notebook.

from py_skipgram_24 import SkipgramModel, create_input_pairs, get_vocab, MyPreprocessor, get_word_vectors
corpus = ["It was a great day. I loved the movie and spending time with you. I wish we had more time.", 
          "The sky is always blue underneath. Remember that."]
sentences = MyPreprocessor(corpus)
pp_corpus = list(sentences)
vocab = get_vocab(pp_corpus)
word2idx = {word: idx for idx, word in enumerate(vocab)}
idx_pairs = create_input_pairs(pp_corpus, word2idx, context_size=2)
model = SkipgramModel(len(vocab), 10)
train_model(model, idx_pairs, epochs=250, learning_rate=0.025)
word_vectors = get_word_vectors(model, word2idx)
print(word_vectors)

📚 Package Integration within the Python Ecosystem

py_skipgram_24, while acknowledging the robustness and the capabilities of PyTorch’s nn.Module, aims to offer a specialized and streamlined toolkit tailored explicitly for Skip-gram tasks. As a lightweight and focused alternative, py_skipgram_24 serves users who seek a concise package that offers preprocessing, creating input pairs, training the model, and getting word vectors functions. While PyTorch covers a broader spectrum of deep learning algorithms, py_skipgram_24 provides a more specialized package, potentially appealing to those who prefer a tailored implementation of their Skip-gram workflows.

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

py_skipgram_24 was created by Bill. It is licensed under the terms of the MIT license.

Credits

py_skipgram_24 was created with cookiecutter and the py-pkgs-cookiecutter template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_skipgram_24-0.2.0.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

py_skipgram_24-0.2.0-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file py_skipgram_24-0.2.0.tar.gz.

File metadata

  • Download URL: py_skipgram_24-0.2.0.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for py_skipgram_24-0.2.0.tar.gz
Algorithm Hash digest
SHA256 4fd1ec28f71ae6eb6e3e2dc869fc28b5e90f296610ade6eb9b048a6c1d5b0f18
MD5 649d94aafb435a0e2371d540a3eb5ec6
BLAKE2b-256 364b020cc255e14021f36f947e3c115eb6b51e799a52335a8581ef32cef4e97c

See more details on using hashes here.

File details

Details for the file py_skipgram_24-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for py_skipgram_24-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8a885d73f68cd0de52a5734a6a38f42e8c48a4e4350aece133dc0e05d038ba31
MD5 66db50b80607425a6eba27793281818b
BLAKE2b-256 83f09ba341b3cf1005313311c897833554470dd8650df5007052e9f0340c7828

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page