Skip to main content

A tool for sibylvariant transformations

Project description

Sibyl

Sibyl is a tool for generating new data from what you have already. Transform your data in over 35 different ways by either selecting one or sampling n transformations at a time.

There are two primary kinds of transformations:

  • Invariant (INV) : transform the input, but the expected output remains the same.
    • ex. "I love NY" + Emojify = "I 💗 NY", which has an INV effect on the sentiment.
  • Sibylvariant (SIB) : transform the input and the expected output may change in some way.
    • ex. "I love NY" + ChangeAntonym = "I hate NY", which has a SIB effect on the sentiment (label inverted from positive (1) to negative (0)).

Some transformations also result in soft labels and they're called SIB-mix transformations. For example, within topic classification you could use SentMix on your data to randomly combine two inputs from different classes into one input and then shuffle the sentences around. This new data would have a new label with probabilities weighted according to the relative length of text contributed by the two inputs. Illustrating with AG_NEWS, if you mix an article about sports with one about business, it might result in a new soft label like [0, 0.5, 0.5, 0]. The intuition here is that humans would be able to recognize that there are two topics in a document and we should expect our models to behave similarly (i.e. the model predictions should be very close to 50/50 on the expected topics).

Examples

Here's a quick example of using a single transform:

from transforms import HomoglyphSwap

transform = HomoglyphSwap(change=0.75)
string_in = "The quick brown fox jumps over the lazy dog"
string_out = transform(string_in)
print(string_out) 

>> Tհe quіc𝒌 Ьⲅоԝn 𝚏о× ϳumрѕ оѵеⲅ 𝚝հе ⅼɑzу ԁoɡ

Here's a quick example using transform_dataset which uniformly samples from the taxonomized transformations so that you generate new data relevant to your particular task.

from datasets import load_dataset
from transforms import *
from utils import *

dataset = load_dataset('glue', 'sst2')
train_data = dataset['train']
train_data.rename_column_('sentence', 'text')

task = 'sentiment'
tran = 'SIB'
n = 2

out = transform_dataset(
    train_data[:5], 
    num_transforms=n, 
    task=task, 
    tran=tran
)

new_text, new_label, trans_applied = out

Here are some examples we've already prepared:

from utils import *

test_suites = pkl_load('assets/SST2/test_suites.pkl')
INV_test_suites = pkl_load('assets/SST2/INV_test_suites.pkl')
SIB_test_suites = pkl_load('assets/SST2/SIB_test_suites.pkl')

n = 3
df_orig = pd.DataFrame.from_dict(test_suites[0]).head(n)
df_INV  = pd.DataFrame.from_dict(INV_test_suites[0]).head(n)
df_SIB  = pd.DataFrame.from_dict({'data': SIB_test_suites[0]['data'], 
                                  'target': SIB_test_suites[0]['target'],
                                  'ts': SIB_test_suites[0]['ts']}).head(n)

df_orig.rename(columns={'data': 'original'}, inplace=True)
df_INV.rename(columns={'data': 'INV_transformed', 'ts' : 'transforms_applied'}, inplace=True)
df_SIB.rename(columns={'data': 'SIB_transformed', 'ts' : 'transforms_applied'}, inplace=True)

df = pd.concat([df_orig, df_INV, df_SIB], axis=1)

df
# original target INV_transformed target transforms_applied SIB_transformed target transforms_applied
0 boisterous and utterly charming 1 boisterous robust+ious and utterly charming 1 ['RandomInsertion', 'RandomCharInsert'] boisterous and utterly charming That being said, I loved it. 💁🏽‍♂ 1 ['InsertPositivePhrase', 'AddPositiveEmoji']
1 pathos-filled but ultimately life-affirming finale 1 рɑthos-fіlled but սⅼtimatеly li𝚏e-/ffirmiոɡ finɑlе 1 ['RandomCharSubst', 'HomoglyphSwap'] pathos-filled but ultimately life-affirming finale https://www.dictionary.com/browse/clunky 🙋 1 ['AddNegativeLink', 'AddPositiveEmoji']
2 with a lower i.q. than when i had entered 0 with a gloomy i.q. than when i had immerse 0 ['ChangeSynonym', 'ChangeHyponym'] with a lower i.q. than when i had entered 👨‍❤‍💋‍👨 That being said, I liked it. 1 ['AddPositiveEmoji', 'InsertPositivePhrase']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sibyl_tool-0.1.2.tar.gz (160.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sibyl_tool-0.1.2-py3-none-any.whl (189.6 kB view details)

Uploaded Python 3

File details

Details for the file sibyl_tool-0.1.2.tar.gz.

File metadata

  • Download URL: sibyl_tool-0.1.2.tar.gz
  • Upload date:
  • Size: 160.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.1

File hashes

Hashes for sibyl_tool-0.1.2.tar.gz
Algorithm Hash digest
SHA256 2b2b4d7689694c6a6593b05a7bbcc4e3969be2967a78c3e555a3a9b2cfcdfd18
MD5 1acca7830c11c4b1dc9ac0382ccdaeae
BLAKE2b-256 b30f8f5ee43d6da81f3ba11a69a665dbf2180de676739ccf4048e50333f18243

See more details on using hashes here.

File details

Details for the file sibyl_tool-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: sibyl_tool-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 189.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.1

File hashes

Hashes for sibyl_tool-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8c057bf4073b6c7d0a29e1ffb81c6eca6847ec9e5d7def57fe6363833cb5d225
MD5 73c87cc97a136781adeca39ccc6c122b
BLAKE2b-256 699cfca676fde5323f5a1e78e2207707425f4ccc38bc5c19a8d181ba475cf597

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page