milk · PyPI

Machine Learning Toolkit

Project description

Machine Learning in Python

Milk is a machine learning toolkit in Python.

Its focus is on supervised classification with several classifiers available: SVMs (based on libsvm), k-NN, random forests, decision trees. It also performs feature selection. These classifiers can be combined in many ways to form different classification systems.

For unsupervised learning, milk supports k-means clustering and affinity propagation.

Milk is flexible about its inputs. It optimised for numpy arrays, but can often handle anything (for example, for SVMs, you can use any dataype and any kernel and it does the right thing).

There is a strong emphasis on speed and low memory usage. Therefore, most of the performance sensitive code is in C++. This is behind Python-based interfaces for convenience.

To learn more, check the docs at http://packages.python.org/milk/ or the code demos included with the source at milk/demos/.

Examples

Here is how to test how well you can classify some features,labels data, measured by cross-validation:

import numpy as np
import milk
features = np.random.rand(100,10) # 2d array of features: 100 examples of 10 features each
labels = np.zeros(100)
features[50:] += .5
labels[50:] = 1
confusion_matrix, names = milk.nfoldcrossvalidation(features, labels)
print 'Accuracy:', confusion_matrix.trace()/float(confusion_matrix.sum())

If want to use a classifier, you instanciate a learner object and call its train() method:

import numpy as np
import milk
features = np.random.rand(100,10)
labels = np.zeros(100)
features[50:] += .5
labels[50:] = 1
learner = milk.defaultclassifier()
model = learner.train(features, labels)

# Now you can use the model on new examples:
example = np.random.rand(10)
print model.apply(example)
example2 = np.random.rand(10)
example2 += .5
print model.apply(example2)

There are several classification methods in the package, but they all use the same interface: train() returns a model object, which has an apply() method to execute on new instances.

Details

License: MIT

Author: Luis Pedro Coelho (with code from LibSVM and scikits.learn)

API Documentation: http://packages.python.org/milk/

Mailing List: http://groups.google.com/group/milk-users

Features

SVMs. Using the libsvm solver with a pythonesque wrapper around it.
K-means using as little memory as possible. It can cluster millions of instances efficiently.
Random forests
Self organising maps
Stepwise Discriminant Analysis for feature selection.
Non-negative matrix factorisation
Affinity propagation

Recent History

The ChangeLog file contains a more complete history.

New in 0.4.1

Fix important bug in multi-process gridsearch

New in 0.4.0

Use multiprocessing to take advantage of multi core machines (off by default).
Add perceptron learner
Set random seed in random forest learner
Add warning to milk/__init__.py if import fails
Add return value to gridminimise
Set random seed in precluster_learner
Implemented Error-Correcting Output Codes for reduction of multi-class to binary (including probability estimation)
Add multi_strategy argument to defaultlearner()
Make the dot kernel in svm much, much, faster
Make sigmoidal fitting for SVM probability estimates faster
Fix bug in randomforest (patch by Wei on milk-users mailing list)

New in 0.3.10

Add ext.jugparallel for integration with jug
parallel nfold crossvalidation using jug
parallel multiple kmeans runs using jug
cluster_agreement for non-ndarrays
Add histogram & normali(z|s)e options to milk.kmeans.assign_centroid
Fix bug in sda when features were constant for a class
Add select_best_kmeans
Added defaultlearner as a better name than defaultclassifier
Add measures.curves.precision_recall
Add unsupervised.parzen.parzen

New in 0.3.9

Add folds argument to nfoldcrossvalidation
Add assign_centroid function in milk.unsupervised.nfoldcrossvalidation
Improve speed of k-nearest neighbour (10x on scikits-learn benchmark)
Improve kmeans on newer numpy (works for larger datasets too)
Faster kmeans by coding centroid recalculation in C++
Fix gridminize for low count labels
Fix bug with non-integer labels for tree learning

New in 0.3.8

Fix compilation on Windows

New in 0.3.7

Logistic regression
Source demos included (in source and documentation)
Add cluster agreement metrics
Fix nfoldcrossvalidation bug when using origins

New in 0.3.6

Unsupervised (1-class) kernel density modeling
Fix for when SDA returns empty
weights option to some learners
stump learner
Adaboost (result of above changes)

Project details

Release history Release notifications | RSS feed

0.6.1

May 11, 2015

0.6

Apr 27, 2015

0.5.3

Jun 19, 2013

0.5.2

Apr 8, 2013

0.5.1

Jan 11, 2013

0.5

Nov 5, 2012

0.4.3

Sep 18, 2012

0.4.2

Jan 16, 2012

This version

0.4.1

Aug 26, 2011

0.4.0

Aug 24, 2011

0.3.10

May 10, 2011

0.3.9

Mar 15, 2011

0.3.8

Feb 12, 2011

0.3.7

Feb 10, 2011

0.3.6

Dec 17, 2010

0.3.5

Nov 4, 2010

0.3.4

Nov 1, 2010

0.3.3

Oct 22, 2010

0.3.2

Oct 19, 2010

0.3.1

Sep 26, 2010

0.3

Sep 23, 2010

0.2.1

Sep 13, 2010

0.2

May 19, 2010

0.1

Apr 26, 2010

0.1-beta-0 pre-release

Jan 30, 2010

0.1-alpha-1 pre-release

Nov 30, 2009

0.1-alpha-0 pre-release

Nov 23, 2009

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

milk-0.4.1.tar.gz (76.4 kB view details)

Uploaded Aug 26, 2011 Source

File details

Details for the file milk-0.4.1.tar.gz.

File metadata

Download URL: milk-0.4.1.tar.gz
Upload date: Aug 26, 2011
Size: 76.4 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for milk-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`6e04472d73b1eb81d5539da2fd4f43eb2bf9c3dd01125f682d8a2ba6b33a920e`
MD5	`afce62e3648dfb29a49f897dd40c7a7d`
BLAKE2b-256	`e10afe3c2b40b34f3e0500d4d972d290570d4ded333dd1bc03af485eac97bb5d`

See more details on using hashes here.

milk 0.4.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta