Skip to main content

a massively parallel implementation of self-organizing maps

Project description

Somoclu is a massively parallel implementation of self-organizing maps. It relies on OpenMP for multicore execution, MPI for distributing the workload, and it can be accelerated by CUDA on a GPU cluster. A sparse kernel is also included, which is useful for training maps on vector spaces generated in text mining processes. The topology of the grid is rectangular. Currently a subset of the command line version is supported with this package.

Key features of the Python interface:

  • Fast execution by parallelization: OpenMP and CUDA are supported.

  • Multi-platform: Linux, OS X, and Windows are supported.

  • Planar and toroid maps.

  • Both dense and sparse input data are supported.

  • Large maps of several hundred thousand neurons are feasible.

Usage

The following file uses rgbs.txt, which can be found at https://github.com/peterwittek/somoclu/tree/master/data

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
import somoclu
import numpy as np

data = np.loadtxt('rgbs.txt')
print(data)
data = np.float32(data)
nSomX = 50
nSomY = 50
nVectors = data.shape[0]
nDimensions = data.shape[1]
data1D = np.ndarray.flatten(data)
nEpoch = 10
radius0 = 0
radiusN = 0
radiusCooling = "linear"
scale0 = 0
scaleN = 0.01
scaleCooling = "linear"
kernelType = 0
mapType = "planar"
snapshots = 0
initialCodebookFilename = ''
codebook_size = nSomY * nSomX * nDimensions
codebook = np.zeros(codebook_size, dtype=np.float32)
globalBmus_size = int(nVectors * int(np.ceil(nVectors/nVectors))*2)
globalBmus = np.zeros(globalBmus_size, dtype=np.intc)
uMatrix_size = nSomX * nSomY
uMatrix = np.zeros(uMatrix_size, dtype=np.float32)
somoclu.trainWrapper(data1D, nEpoch, nSomX, nSomY,
                     nDimensions, nVectors,
                     radius0, radiusN,
                     radiusCooling, scale0, scaleN,
                     scaleCooling, snapshots,
                     kernelType, mapType,
                     initialCodebookFilename,
                     codebook, globalBmus, uMatrix)
print codebook
print globalBmus
print uMatrix

Installation

The code is available on PyPI, hence it can be installed by

$ sudo pip install somoclu

If you want the latest git version, follow the standard procedure for installing Python modules:

$ sudo python setup.py install

Build on Mac OS X

Before installing using pip, gcc should be installed first. As of OS X 10.9, gcc is just symlink to clang. To build somoclu and this extension correctly, it is recommended to install gcc using something like:

$ brew install gcc48

and set environment using:

export CC=/usr/local/bin/gcc
export CXX=/usr/local/bin/g++
export CPP=/usr/local/bin/cpp
export LD=/usr/local/bin/gcc
alias c++=/usr/local/bin/c++
alias g++=/usr/local/bin/g++
alias gcc=/usr/local/bin/gcc
alias cpp=/usr/local/bin/cpp
alias ld=/usr/local/bin/gcc
alias cc=/usr/local/bin/gcc

Then you can issue

$ sudo pip install somoclu

Build with CUDA support on Linux and OS X:

You need to clone the GitHub repo or download the latest release tarball, and use the following script:

$ cd src/Python
$ bash makepy.sh

Then if your CUDA installation is located at /opt/cuda and 64bit, you can do the following to install:

$ sudo python2 setup_with_CUDA.py install

Otherwise, you should modify the setup_with_CUDA.py , change the path to CUDA installation accordingly:

call(["./configure", "--without-mpi","--with-cuda=/opt/cuda/"])

and

library_dirs=['/opt/cuda/lib64']

Then run the install command

$ sudo python2 setup_with_CUDA.py install

Then you can use the python interface like before, with CUDA support.

Build with CUDA support on Windows:

You should first follow the instructions to build the windows binary at https://github.com/peterwittek/somoclu with MPI disabled with the same version Visual Studio as your Python is built.(Since currently Python is built by VS2008 by default and CUDA v6.5 removed VS2008 support, you may use CUDA 6.0 with VS2008 or find a Python prebuilt with VS2010. And remember to install VS2010 or Windows SDK7.1 to get the option in Platform Toolset if you use VS2013.) Then you should copy the .obj files generated in the release build path to the Python/src/src folder.

Then modify the library_dirs in setup_with_CUDA_Win.py to your CUDA path.

Then run the install command

$ sudo python2 setup_with_CUDA_Win.py install

Then it should be able to build and install the extension.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

somoclu-1.4.1.1.tar.gz (64.7 kB view details)

Uploaded Source

File details

Details for the file somoclu-1.4.1.1.tar.gz.

File metadata

  • Download URL: somoclu-1.4.1.1.tar.gz
  • Upload date:
  • Size: 64.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for somoclu-1.4.1.1.tar.gz
Algorithm Hash digest
SHA256 9ac6157721770179564a8b763d739c9d5ff1cc7ed00c3209dd841673c3f949c9
MD5 e9fa8eef7b8beaeb309c22ce500fbc68
BLAKE2b-256 fb43b4de92f469994806279508cbedcbbf1f08724c252e325264e7df3679ff34

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page