Simply create (N)grams
Project description
(N)Grams
Simply create (N)grams: N ~ Bi | Tri ...
Welcome to bigrams, a Python project that provides a non-intrusive way to connect tokenized sentences in (N)grams. This tool is designed to work with tokenized sentences, and it is focused on a single task: providing an efficient way to merge tokens from a list of tokenized sentences.
It's non-intrusive as it leaves tokenisation, stopwords removal and other text preprocessing out of its flow.
Source Code: https://github.com/proteusiq/bigrams
PyPI: https://pypi.org/project/bigrams/
Installation
pip install -U bigrams
Usage
To use bigrams, import it into your Python script, and use scikit-learn
-ish API to transform your tokens.
from bigrams import Grams
# expects tokenised sentences
in_sentences = [["this", "is", "new", "york", "baby", "again!"],
["new", "york", "and", "baby", "again!"],
]
g = Grams(window_size=2, threshold=2)
out_sentences = g.fit_transform(in_stences)
print(out_sentences)
# [["this", "is", "new_york", "baby_again!"],
# ["new_york", "and", "baby_again!"],
# ]
Development
- Clone this repository
- Requirements:
- Poetry
- Python 3.7+
- Create a virtual environment and install the dependencies
poetry install
- Activate the virtual environment
poetry shell
Testing
pytest
Pre-commit
Pre-commit hooks run all the auto-formatters (e.g. black
, isort
), linters (e.g. mypy
, flake8
), and other quality
checks to make sure the changeset is in good shape before a commit/push happens.
You can install the hooks with (runs for each commit):
pre-commit install
Or if you want them to run only for each push:
pre-commit install -t pre-push
Or if you want e.g. want to run all checks manually for all files:
pre-commit run --all-files
Contributing are welcome
ToDo:
-
create a save & load function - compare it with gensim Phrases
- write replacer in Rust - PyO3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bigrams-0.1.2.tar.gz
.
File metadata
- Download URL: bigrams-0.1.2.tar.gz
- Upload date:
- Size: 5.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.1 CPython/3.10.6 Linux/5.15.0-58-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d05f4c48bfb9e53e29cec1f7cc39b8b8713c0937eba1b86623d59d98e1652d2e |
|
MD5 | c67e833432b5e4b2d989d374f22c7089 |
|
BLAKE2b-256 | 2d561509e8754aef4014a37041c2ab5f4070fd4cc1f07e65f32cb2785cfb9903 |
File details
Details for the file bigrams-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: bigrams-0.1.2-py3-none-any.whl
- Upload date:
- Size: 4.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.1 CPython/3.10.6 Linux/5.15.0-58-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b8859c06af9bd7afaefcee72c5c1db830173dbb26df76a6389d52a0801fdfc45 |
|
MD5 | 8b8c91f51b89ba534fa330f19b91d66b |
|
BLAKE2b-256 | ca5dda902072a88be792a359253d4fa47016a113ec50da126b024f57da1368f1 |