A package to create and work with zipf distributions
Project description
What does it do?
The zipf package was realized to simplify creations and operations with zipf distributions, like sum, subtraction, multiplications, divisions, statical operations such as mean, variance and much more.
How do I get it?
Just type into your terminal:
pip install zipf
Calculating distances and divergence
I wrote another package called dictances which calculates various distances and divergences between discrete distributions such as zipf. Here’s an example:
from zipf import Zipf
from dictances import *
a = zipf.load("my_first_zipf.json")
b = zipf.load("my_second_zipf.json")
euclidean(a, b)
chebyshev(a, b)
hamming(a, b)
kullback_leibler(a, b)
jensen_shannon(a, b)
Creating a zipf using a zipf_factory
Here’s a couple of examples:
Zipf from a list
from zipf.factories import ZipfFromList
my_factory = ZipfFromList()
my_zipf = my_factory.run(["one", "one", "two", "my", "oh", "my", 1, 2, 3])
print(my_zipf)
'''
{
"one": 0.22222222222222215,
"my": 0.22222222222222215,
"two": 0.11111111111111108,
"oh": 0.11111111111111108,
"1": 0.11111111111111108,
"2": 0.11111111111111108,
"3": 0.11111111111111108
}
'''
Zipf from a text
from zipf.factories import ZipfFromText
my_factory = ZipfFromText()
my_factory.set_word_filter(lambda w: len(w) > 3)
my_zipf = my_factory.run(
"""You've got to find what you love.
And that is as true for your work as it is for your lovers.
Keep looking. Don't settle.""")
print(my_zipf)
'''
{
"your": 0.16666666666666666,
"find": 0.08333333333333333,
"what": 0.08333333333333333,
"love": 0.08333333333333333,
"that": 0.08333333333333333,
"true": 0.08333333333333333,
"work": 0.08333333333333333,
"lovers": 0.08333333333333333,
"Keep": 0.08333333333333333,
"looking": 0.08333333333333333,
"settle": 0.08333333333333333
}
'''
Zipf from a k-sequence
from zipf.factories import ZipfFromKSequence
sequence_fraction_len = 5
my_factory = ZipfFromKSequence(sequence_fraction_len)
my_zipf = my_factory.run(
"ACTGGAAATGATGGDTGATDGATGAGTDGATGGGGGAAAGDTGATDGATDGATGDTGGGGADDDGATAGDTAGTDGAGAGAGDTGATDGAAAGDTG")
print(my_zipf)
'''
{
"TGGGG": 0.1,
"ACTGG": 0.05,
"AAATG": 0.05,
"ATGGD": 0.05,
"TGATD": 0.05,
"GATGA": 0.05,
"GTDGA": 0.05,
"GAAAG": 0.05,
"DTGAT": 0.05,
"DGATD": 0.05,
"GATGD": 0.05,
"ADDDG": 0.05,
"ATAGD": 0.05,
"TAGTD": 0.05,
"GAGAG": 0.05,
"AGDTG": 0.05,
"ATDGA": 0.05,
"AAGDT": 0.05,
"G": 0.05
}
'''
Zipf from a text file
from zipf.factories import ZipfFromFile
my_factory = ZipfFromFile()
my_factory.set_word_filter(lambda w: w != "brown")
my_zipf = my_factory.run()
print(my_zipf)
'''
{
"The": 0.125,
"quick": 0.125,
"fox": 0.125,
"jumps": 0.125,
"over": 0.125,
"the": 0.125,
"lazy": 0.125,
"dog": 0.125
}
'''
Zipf from webpage
from zipf.factories import ZipfFromUrl
import json
my_factory = ZipfFromUrl()
my_factory.set_word_filter(lambda w: int(w) > 100)
my_factory.set_interface(lambda r: json.loads(r.text)["ip"])
my_zipf = my_factory.run("https://api.ipify.org/?format=json")
print(my_zipf)
'''
{
"134": 0.5,
"165": 0.5
}
'''
Zipf from directory
from zipf.factories import ZipfFromDir
import json
my_factory = ZipfFromDir(use_cli=True)
my_factory.set_word_filter(lambda w: len(w) > 4)
my_zipf = my_factory.run("path/to/my/directory", ["txt"])
'''
My directory contains 2 files with the following texts:
- You must not lose faith in humanity.
Humanity is an ocean; if a few drops of the ocean are dirty,
the ocean does not become dirty.
- Try not to become a man of success,
but rather try to become a man of value.
'''
print(my_zipf)
'''
{
"ocean": 0.20000000000000004,
"become": 0.20000000000000004,
"dirty": 0.13333333333333336,
"faith": 0.06666666666666668,
"humanity": 0.06666666666666668,
"Humanity": 0.06666666666666668,
"drops": 0.06666666666666668,
"success": 0.06666666666666668,
"rather": 0.06666666666666668,
"value": 0.06666666666666668
}
'''
Options in creating a zipf
Some built in options are available, and you can read the options of any factory object by printing it:
from zipf.zipf.factories import ZipfFromList
print(ZipfFromList())
'''
{
"remove_stop_words": false, # Removes stop words (currently only Italian's)
"minimum_count": 0, # Removes words that appear less than 'minimum_count'
"chain_min_len": 1, # Chains up words, starting by a min of 'chain_min_len'
"chain_max_len": 1, # and ending to a maximum of 'chain_max_len'
"chaining_character": " ", # The character to interpose between words
"chain_after_filter": false, # The chaining is done after filtering
"chain_after_clean": false # The chaining is done after cleaning
}
'''
License
This library is released under MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
zipf-1.5.0.tar.gz
(15.6 kB
view hashes)