Skip to main content

A Python library for augmenting Korean text.

Project description

Textmentations

Textmentations is a Python library for augmenting Korean text. Inspired by albumentations. Textmentations uses the albumentations as a dependency.

Installation

pip install textmentations

A simple example

Textmentations provides text augmentation techniques implemented using the TextTransform, which inherits from the albumentations BasicTransform.

This allows textmentations to reuse the existing functionalities of albumentations.

import textmentations as T

text = "어제 식당에 갔다. 목이 너무 말랐다. 먼저 물 한 잔을 마셨다. 그리고 탕수육을 맛있게 먹었다."
rd = T.RandomDeletion(deletion_prob=0.1, min_words_per_sentence=0.8)
ri = T.RandomInsertion(insertion_prob=0.2, n_times=1)
rs = T.RandomSwap(alpha=1)
sr = T.SynonymReplacement(replacement_prob=0.2)
eda = T.Compose([rd, ri, rs, sr])

print(rd(text=text)["text"])
# 식당에 갔다. 목이 너무 말랐다. 먼저 물 잔을 마셨다. 그리고 탕수육을 맛있게 먹었다.

print(ri(text=text)["text"])
# 어제 최근 식당에 갔다. 목이 너무 말랐다. 먼저 물 한 잔을 마셨다 음료수. 그리고 탕수육을 맛있게 먹었다.

print(rs(text=text)["text"])
# 어제 갔다 식당에. 목이 너무 말랐다. 물 먼저 한 잔을 마셨다. 그리고 탕수육을 맛있게 먹었다..

print(sr(text=text)["text"])
# 과거 식당에 갔다. 목이 너무 말랐다. 먼저 소주 한 잔을 마셨다. 그리고 탕수육을 맛있게 먹었다.

print(eda(text=text)["text"])
# 식당에 어제 과거 갔다. 너무 말랐다. 먼저 상수 한 잔을 마셨다 맹물. 그리고 맛있게 먹었다.

List of augmentations

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textmentations-1.4.0.tar.gz (49.8 MB view details)

Uploaded Source

Built Distribution

textmentations-1.4.0-py3-none-any.whl (49.8 MB view details)

Uploaded Python 3

File details

Details for the file textmentations-1.4.0.tar.gz.

File metadata

  • Download URL: textmentations-1.4.0.tar.gz
  • Upload date:
  • Size: 49.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.14

File hashes

Hashes for textmentations-1.4.0.tar.gz
Algorithm Hash digest
SHA256 77b1816077c08cc2956f698d8a9825804f09b369a89e4ab50ca36e27839930b1
MD5 932501f77344f0a0d755ea5fa3166b7a
BLAKE2b-256 2b94eb13efb0b226fab8d120c69ed4560b8d39100f30e7e49283e339d46d3a1d

See more details on using hashes here.

File details

Details for the file textmentations-1.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for textmentations-1.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0cde4484974ac184c88cc96cb17f304625c7357ff6bbb4d73c3cf341b156e11d
MD5 a13a8d39b558ce80d8cadf886168fa24
BLAKE2b-256 11f31953d24f57ebaf7997b65e81df8fbca4bfa6540bae88b8a6680d595f2749

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page