Skip to main content

A general purpose framework for training and testing classification algorithms.

Project description

Overview

There are tons of amazing algorithms and machine learning tools for detecting patterns in data. However, what most of these lack is a useful framework and UI for managing the often complicated setup of the data flow and predictions.

This package provides several tools for utilizing Django’s admin interface and ORM to help organize and manage machine learning setups.

The framework revolves around two basic objects:

  1. A problem, which organizes solutions to acheive some prediction goal. This is mainly implemented a genetic algorithm.

  2. A predictor, which organizes a specific solution to either guess a numeric value (i.e. regression) or a label (i.e. classification).

I made this separation to help myself with maintainence over the life time of an application. Often, I’d want to monitor the accuracy of a solution, but also evaluation other potential solutions without interrupting the solution used for production predictions. Once a superior solution was found, then I’d want to push it into production use with as little effort as possible. By explicitly representing different solutions as different records in the database, I found I could easily monitor them and slip them in and out of use as needed.

Problem

The problem represents a domain where we’re attempting to solve some prediction task, by either guessing a number or guessing a label. In the code, this is referred to as the Genome. A record in the Genome table represents a distinct problem domain and stores all the parameters used to control and manage the search for solutions.

From the Genome you define Genes, which are parameters available for use when attempting to solve the problem.

Specific solutions to the problem are represented by the Genotype model, which contains a list of genes and their associated values as key/values pairs.

To search for the best solution to a problem, you first implement a custom evaluating function, which will take a genotype as an argument and return a positive number, called the fitness, representing its overall suitability in solving the problem. By default, a value of 0 is interpreted to be the worse possible fitness and increasing value representing increasing levels of suitability. Personally, I find it convenience and intuitive to bound fitness between 0 and 1, but this is not strictly enforced.

You then set this function in your Genome's evaluator field and run the management command:

python manage.py evolve_population --genome=<genome_id>

Depending on the other settings in the genome, this will run for a maximum predetermined number of iterations or until improvement of the fitness has stalled. From the genome’s admin change page, you can browse the list of generated genotypes and inspect their fitness, possibly selecting one for production use.

For example, a simple genome might consist of a single gene called algorithm, which contains one of several algorithm names (e.g. ‘Bayesian’, ‘LinearSVC’, ‘RandomForest’, etc.). You would write your evaluation function to read this string and instantiate the appropriate class associated with the name. You could then add additional genes representing parameters common to multiple algorithms or unique to only a few. The Genotype model with generate a unique hash based on which genes it contains, and use this to avoid creating duplicate genotypes.

Predictor

todo

Usage

todo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django-analyze-0.4.23.tar.gz (85.9 kB view details)

Uploaded Source

File details

Details for the file django-analyze-0.4.23.tar.gz.

File metadata

File hashes

Hashes for django-analyze-0.4.23.tar.gz
Algorithm Hash digest
SHA256 d5f470adcbd1dd143dc591f651134e6d41fbade4e01982a8bb5275b18b273ed5
MD5 ddde24f14e59301002dd883ed9c7ec0c
BLAKE2b-256 42653524a7368f92f1ebcb0ecda693e51902ffea34d4b8e1f38388794a1a7290

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page