Skip to main content

A Python Dirichlet Multinomial Mixture Model

Project description

pyDIMM

A Python Dirichlet Multinomial Mixture Model.

Ready

Typically, if you import pyDIMM in your program, clibs will be automatically compiled, and you can skip this part.

We need to first compile the files in clibs. The makefile has been provided.

cd clibs && make

Also, you can compile it by yourself using gcc. Compile ./clibs/pyDIMM_libs.c by the instructions in the head of that file, and then you will get a ./clibs/pyDIMM_libs.so file.

cd clibs && gcc -lm -shared -fPIC -o pyDIMM_libs.so pyDIMM_libs.c

Check the files now.

└───pyDIMM
    |
    ├───pyDIMM
    |   ├───__init__.py
    |   ├───class_DIMM.py
    |   └───clibs
    |       ├───makefile
    |       ├───pyDIMM_libs.c
    |       └───pyDIMM_libs.so
    |
    └───Some other files...

How to use

All the methods are based on the class DIMM. You need an instance of pyDIMM.DIMM to get started.

import pyDIMM
  • Example 1
    dimm_0 = pyDIMM.DIMM(observe_data=your_data, n_components=3, alpha_init='kmeans')
    
  • Example 2
    dimm_1 = pyDIMM.DIMM(observe_data=your_data, n_components=5, alpha_init='manual', prior_label=your_label, print_log=True)
    

Train (by EM algorithm)

Use EM algorithm to train the model. The EM algorithm is written in C (yes, it's the code in pyDIMM.c, we use ctypes to implement that.).

  • Example
    dimm_0.EM(max_iter=100, max_loglik_tol=1e-3, max_alpha_tol=1)
    

OK, the DIMM is already trained now. We need to get the result back. All the result information is in one dictionary.

  • Example
    result_0 = dimm_0.get_model()
    print(result_0)
    
    you're supposed to see
    {
    'alpha': array([...]),
    'pie': array([...]),
    'delta': array([...]),
    'loglik': ...,
    'AIC': ...,
    'BIC': ...
    }
    

That's all! You get it.

Note:
Once you get the trained DIMM model, the result parameters are stored in the DIMM instance. So every time you want to retrieve the result back, just call the get_model() method. (Only if you don't change the instance before, such as call EM() again. That will of course change the result stored.)

Predict

Sometimes you are not only want to fit a DIMM, but also want to use this model to predict some other data (If you don't want, forget it). Fortunately, we have the method predict().

  • Example
    predict_res = dimm_0.predict(another_data)
    

Then you'll get the predict result label and delta in the predict_res dictionary. Find the detail explanation in the doc in codes.

Save & Load

All information of your DIMM instance can be saved to .npy file and then can be loaded anytime and anywhere.

  • Example
    dimm_0.save('./dimm_0_file')
    
    After this, a new file called dimm_0_file.npy(the postfix .npy is automatically added) will appear in your current folder. You can read from the file later.
    dimm_load = pyDIMM.DIMM.load('./dimm_0_file.npy')
    

Contact

Ziqi Rong rongziqi@sjtu.edu.cn

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyDIMM-0.0.2.tar.gz (11.3 kB view details)

Uploaded Source

File details

Details for the file pyDIMM-0.0.2.tar.gz.

File metadata

  • Download URL: pyDIMM-0.0.2.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for pyDIMM-0.0.2.tar.gz
Algorithm Hash digest
SHA256 590b66290c5411ac60c0223374caf4a0b2bb5876741773ba96ac3d1e7023df59
MD5 d6aad87af0eb9522c82e3d6d0c65db13
BLAKE2b-256 1dd9eb17ca47fc4f90828f3ef1c13811ed61b858bbfdd0b4e7138ca04cfa6c00

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page