Skip to main content

Hash based machine learning

Project description

HashedML

A machine learning library that uses a different approach: string hashing (think hash tables) for classifying sequences.

Installation

PyPI (not available yet):

pip install -U hashedml

setup.py:

python setup.py build
python setup.py install

Classification

HashedML takes the simple fit(X, y) / predict(X) approach.

Example:

model = HashedML()
iris_data = open('iris.data').read().split('\n')
for i in iris_data:
    i = i.split(',')
    X = i[:-1]
    y = i[-1]
    model.fit(X, y)

iris_test = open('iris.test').read().split('\n')
for i in irist_test:
    i = i.split(',')
    X = i[:-1]
    y = i[-1]
    # use test() to get accuracy
    prediction = model.test(X, y)
    # -or: normally you don't have 'y'
    prediction = model.predict(X)

print('accuracy: {}%'.format(model.accuracy()*100))

Generative

HashedML can also generate data after learning.

Example:

from collections import deque
model = HashedML(nback=4, stm=True)
token_q = deque(maxlen=model.nback)
tokens = []

tokens = TextBlob(open('training.text').read()).tokens

# Learn
for i in tokens:
    token_q.append(i)
    if len(token_q) != model.nback:
        continue
    X = list(token_q)tq[:-1]
    y = list(token_q)tq[-1]
    model.fit(X, y)

# Generate
output = model.generate(
    ('What', 'is'),
    nwords=500,
    seperator=' '
)
print(output)

Variable X Input & Non-numerical X or Y

The X value can be of varying length/dimensions. For example, this is valid:

X = (
    (1, 2, 3),
    (1, 2),
    (1, 2, 3, 4),
)
# y can be of different data types
y  = (
    'y1',
    2.0,
    'foostring'
)

All data is converted to strings. This is conterintuitive and different than most machine learning libraries, but helps with working with variable X/y data.

Examples

% for i in test-data/*.test; do echo; echo -en "$i: "; data_file=$(echo $i|sed 's/.test/.data/g'); hashedml classify $data_file $i ; done

test-data/abalone.test: accuracy: 100.0%

test-data/allhypo.test: accuracy: 89.61%

test-data/anneal.test: accuracy: 82.0%

test-data/arrhythmia.test: accuracy: 100.0%

test-data/breast-cancer.test: accuracy: 100.0%

test-data/bupa.test: accuracy: 100.0%

test-data/glass.test: accuracy: 100.0%

test-data/iris.test: accuracy: 100.0%

test-data/long.test: accuracy: 100.0%

test-data/parkinsons_updrs.test: accuracy: 100.0%

test-data/soybean-large.test: accuracy: 97.87%

test-data/tic-tac-toe.test: accuracy: 100.0%

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hashedml-0.0.1.tar.gz (5.7 kB view hashes)

Uploaded Source

Built Distribution

hashedml-0.0.1-py3.8.egg (4.9 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page