No project description provided
Project description
🎨 diversity
diversity
is a package that checks for and scores repeated structures and patterns in the output of language models.
Installation
pip install diversity
Command-line
python examples/summarization.py <DATASET CSV>
Library
This library supports various scoring methods for evaluating the homogeneity and diversity of outputs.
from diversity import compression_ratio, homogenization_score, ngram_diversity_score
data_example = [
"I enjoy walking with my cute dog for the rest of the day, but this time it was hard for me to figure out what to do with it. When I finally looked at this for a few moments, I immediately thought.",
"I enjoy walking with my cute dog. The only time I felt like walking was when I was working, so it was awesome for me. I didn't want to walk for days. I am really curious how she can walk with me",
"I enjoy walking with my cute dog (Chama-I-I-I-I-I), and I really enjoy running. I play in a little game I play with my brother in which I take pictures of our houses."
]
cr = compression_ratio(data_example, 'gzip')
hs = homogenization_score(data_example, 'rougel')
# hs = homogenization_score(data_example, 'bertscore')
self_bleu = homogenization_score(data_example, 'bleu')
nds = ngram_diversity_score(data_example, 4)
print(cr, hs, nds)
1.641 0.222 3.315
You can also visualize various ngram patterns using this library:
n = 6
# get the token-level patterns
patterns_token = token_patterns(outputs, n)
# get the POS patterns
joined_pos, tuples = get_pos(outputs)
ngrams_pos = token_patterns(joined_pos, n)
# for the top n-gram patterns, cycle through and get the matching text
text_matches = {}
for pattern, _ in ngrams_pos:
text_matches['pattern'] = pos_patterns(tuples, pattern)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
diversity-0.1.17.tar.gz
(5.4 kB
view hashes)
Built Distribution
Close
Hashes for diversity-0.1.17-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7cf316c00bc3d041c10161d7da17ea884d6a003e4426d9e748816d7d56e5e66f |
|
MD5 | 59104f97524a1fc8f89d10a079544dcf |
|
BLAKE2b-256 | 46903caa96d560cca069dcd028b58e64fef0f075de46260e6cbee2996ef12a60 |