Skip to main content

Simply create (N)grams

Project description

(N)Grams

bigrams

Simply create (N)grams: N ~ Bi | Tri ...

PyPI PyPI - Python Version PyPI - License HitCount

Welcome to bigrams, a Python project that provides a non-intrusive way to connect tokenized sentences in (N)grams. This tool is designed to work with tokenized sentences, and it is focused on a single task: providing an efficient way to merge tokens from a list of tokenized sentences.

It's non-intrusive as it leaves tokenisation, stopwords removal and other text preprocessing out of its flow.


Source Code: https://github.com/proteusiq/bigrams

PyPI: https://pypi.org/project/bigrams/


Installation

pip install -U bigrams

Usage

To use bigrams, import it into your Python script, and use scikit-learn-ish API to transform your tokens.

from bigrams import Grams

# expects tokenised sentences
in_sentences = [["this", "is", "new", "york", "baby", "again!"],
              ["new", "york", "and", "baby", "again!"],
            ]
g = Grams(window_size=2, threshold=2)

out_sentences = g.fit_transform(in_stences)
print(out_sentences)
# [["this", "is", "new_york", "baby_again!"],
#   ["new_york", "and", "baby_again!"],
#  ]

Development

  • Clone this repository
  • Requirements:
  • Create a virtual environment and install the dependencies
poetry install
  • Activate the virtual environment
poetry shell

Testing

pytest

Pre-commit

Pre-commit hooks run all the auto-formatters (e.g. black, isort), linters (e.g. mypy, flake8), and other quality checks to make sure the changeset is in good shape before a commit/push happens.

You can install the hooks with (runs for each commit):

pre-commit install

Or if you want them to run only for each push:

pre-commit install -t pre-push

Or if you want e.g. want to run all checks manually for all files:

pre-commit run --all-files

Contributing are welcome

ToDo:

  • create a save & load function
  • compare it with gensim Phrases
  • write replacer in Rust - PyO3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bigrams-0.1.2.tar.gz (5.2 kB view hashes)

Uploaded Source

Built Distribution

bigrams-0.1.2-py3-none-any.whl (4.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page