Skip to main content

Simply create (N)grams

Project description

(N)Grams

bigrams

Simply create (N)grams: N ~ Bi | Tri ...

PyPI PyPI - Python Version PyPI - License HitCount

Welcome to bigrams, a Python project that provides a non-intrusive way to connect tokenized sentences in (N)grams. This tool is designed to work with tokenized sentences, and it is focused on a single task: providing an efficient way to merge tokens from a list of tokenized sentences.

It's non-intrusive as it leaves tokenisation, stopwords removal and other text preprocessing out of its flow.


Source Code: https://github.com/proteusiq/bigrams

PyPI: https://pypi.org/project/bigrams/


Installation

pip install -U bigrams

Usage

To use bigrams, import it into your Python script, and use scikit-learn-ish API to transform your tokens.

from bigrams import Grams

# expects tokenised sentences
in_sentences = [["this", "is", "new", "york", "baby", "again!"],
              ["new", "york", "and", "baby", "again!"],
            ]
g = Grams(window_size=2, threshold=2)

out_sentences = g.fit_transform(in_stences)
print(out_sentences)
# [["this", "is", "new_york", "baby_again!"],
#   ["new_york", "and", "baby_again!"],
#  ]

Development

  • Clone this repository
  • Requirements:
  • Create a virtual environment and install the dependencies
poetry install
  • Activate the virtual environment
poetry shell

Testing

pytest

Pre-commit

Pre-commit hooks run all the auto-formatters (e.g. black, isort), linters (e.g. mypy, flake8), and other quality checks to make sure the changeset is in good shape before a commit/push happens.

You can install the hooks with (runs for each commit):

pre-commit install

Or if you want them to run only for each push:

pre-commit install -t pre-push

Or if you want e.g. want to run all checks manually for all files:

pre-commit run --all-files

Contributing are welcome

ToDo:

  • create a save & load function
  • compare it with gensim Phrases
  • write replacer in Rust - PyO3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bigrams-0.1.2.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

bigrams-0.1.2-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file bigrams-0.1.2.tar.gz.

File metadata

  • Download URL: bigrams-0.1.2.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.1 CPython/3.10.6 Linux/5.15.0-58-generic

File hashes

Hashes for bigrams-0.1.2.tar.gz
Algorithm Hash digest
SHA256 d05f4c48bfb9e53e29cec1f7cc39b8b8713c0937eba1b86623d59d98e1652d2e
MD5 c67e833432b5e4b2d989d374f22c7089
BLAKE2b-256 2d561509e8754aef4014a37041c2ab5f4070fd4cc1f07e65f32cb2785cfb9903

See more details on using hashes here.

File details

Details for the file bigrams-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: bigrams-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 4.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.1 CPython/3.10.6 Linux/5.15.0-58-generic

File hashes

Hashes for bigrams-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b8859c06af9bd7afaefcee72c5c1db830173dbb26df76a6389d52a0801fdfc45
MD5 8b8c91f51b89ba534fa330f19b91d66b
BLAKE2b-256 ca5dda902072a88be792a359253d4fa47016a113ec50da126b024f57da1368f1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page