Skip to main content

A tool for sibylvariant transformations

Project description

Sibyl

Sibyl is a tool for generating new data from what you have already. Transform your data in over 35 different ways by either selecting one or sampling n transformations at a time.

There are two primary kinds of transformations:

  • Invariant (INV) : transform the input, but the expected output remains the same.
    • ex. "I love NY" + Emojify = "I 💗 NY", which has an INV effect on the sentiment.
  • Sibylvariant (SIB) : transform the input and the expected output may change in some way.
    • ex. "I love NY" + ChangeAntonym = "I hate NY", which has a SIB effect on the sentiment (label inverted from positive (1) to negative (0)).

Some transformations also result in soft labels and they're called SIB-mix transformations. For example, within topic classification you could use SentMix on your data to randomly combine two inputs from different classes into one input and then shuffle the sentences around. This new data would have a new label with probabilities weighted according to the relative length of text contributed by the two inputs. Illustrating with AG_NEWS, if you mix an article about sports with one about business, it might result in a new soft label like [0, 0.5, 0.5, 0]. The intuition here is that humans would be able to recognize that there are two topics in a document and we should expect our models to behave similarly (i.e. the model predictions should be very close to 50/50 on the expected topics).

Examples

Here's a quick example of using a single transform:

from transforms import HomoglyphSwap

transform = HomoglyphSwap(change=0.75)
string_in = "The quick brown fox jumps over the lazy dog"
string_out = transform(string_in)
print(string_out) 

>> Tհe quіc𝒌 Ьⲅоԝn 𝚏о× ϳumрѕ оѵеⲅ 𝚝հе ⅼɑzу ԁoɡ

Here's a quick example using transform_dataset which uniformly samples from the taxonomized transformations so that you generate new data relevant to your particular task.

from datasets import load_dataset
from transforms import *
from utils import *

dataset = load_dataset('glue', 'sst2')
train_data = dataset['train']
train_data.rename_column_('sentence', 'text')

task = 'sentiment'
tran = 'SIB'
n = 2

out = transform_dataset(
    train_data[:5], 
    num_transforms=n, 
    task=task, 
    tran=tran
)

new_text, new_label, trans_applied = out

Here are some examples we've already prepared:

from utils import *

test_suites = pkl_load('assets/SST2/test_suites.pkl')
INV_test_suites = pkl_load('assets/SST2/INV_test_suites.pkl')
SIB_test_suites = pkl_load('assets/SST2/SIB_test_suites.pkl')

n = 3
df_orig = pd.DataFrame.from_dict(test_suites[0]).head(n)
df_INV  = pd.DataFrame.from_dict(INV_test_suites[0]).head(n)
df_SIB  = pd.DataFrame.from_dict({'data': SIB_test_suites[0]['data'], 
                                  'target': SIB_test_suites[0]['target'],
                                  'ts': SIB_test_suites[0]['ts']}).head(n)

df_orig.rename(columns={'data': 'original'}, inplace=True)
df_INV.rename(columns={'data': 'INV_transformed', 'ts' : 'transforms_applied'}, inplace=True)
df_SIB.rename(columns={'data': 'SIB_transformed', 'ts' : 'transforms_applied'}, inplace=True)

df = pd.concat([df_orig, df_INV, df_SIB], axis=1)

df
# original target INV_transformed target transforms_applied SIB_transformed target transforms_applied
0 boisterous and utterly charming 1 boisterous robust+ious and utterly charming 1 ['RandomInsertion', 'RandomCharInsert'] boisterous and utterly charming That being said, I loved it. 💁🏽‍♂ 1 ['InsertPositivePhrase', 'AddPositiveEmoji']
1 pathos-filled but ultimately life-affirming finale 1 рɑthos-fіlled but սⅼtimatеly li𝚏e-/ffirmiոɡ finɑlе 1 ['RandomCharSubst', 'HomoglyphSwap'] pathos-filled but ultimately life-affirming finale https://www.dictionary.com/browse/clunky 🙋 1 ['AddNegativeLink', 'AddPositiveEmoji']
2 with a lower i.q. than when i had entered 0 with a gloomy i.q. than when i had immerse 0 ['ChangeSynonym', 'ChangeHyponym'] with a lower i.q. than when i had entered 👨‍❤‍💋‍👨 That being said, I liked it. 1 ['AddPositiveEmoji', 'InsertPositivePhrase']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sibyl_tool-0.1.1.tar.gz (160.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sibyl_tool-0.1.1-py3-none-any.whl (189.6 kB view details)

Uploaded Python 3

File details

Details for the file sibyl_tool-0.1.1.tar.gz.

File metadata

  • Download URL: sibyl_tool-0.1.1.tar.gz
  • Upload date:
  • Size: 160.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.1

File hashes

Hashes for sibyl_tool-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d933f05f7bf346b6a6eda11c6fc34da6a30ff2e7924f60e2d565d97007813d9c
MD5 2478261963d894c6d3ece71cc83e51e9
BLAKE2b-256 e8babb7a5a433ce5384c610cd55fe4eefd8eef90110a8a4cb369b654e40f9a1c

See more details on using hashes here.

File details

Details for the file sibyl_tool-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: sibyl_tool-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 189.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.1

File hashes

Hashes for sibyl_tool-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9715cee719c039ab69c0551e633fa21d562049da7d4827f7a999a1fb25dad01f
MD5 d726a04ecbbd2ca5042223c114c67b14
BLAKE2b-256 e9bf1d1ead16bcf56468b7a55c6666672c33c205aaa4a637c8d5e6384aa2549e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page