Skip to main content

Maximum entropy modelling code previously available as scipy.maxentropy

Project description

scipy-maxentropy: maximum entropy models

This is the former scipy.maxentropy package that was available in SciPy up to version 0.10.1. It was under-maintained and later removed in SciPy 0.11. It is now available as this separate package for backward compatibility.

For new projects, consider the maxentropy package instead, which offers a more modern scikit-learn compatible API.

Purpose

This package fits "exponential family" models, including models of maximum entropy and minimum KL divergence to other models, subject to linear constraints on the expectations of arbitrary feature statistics. Applications include language models for natural language processing and understanding, machine translation, etc., environmental species modelling, image reconstruction, and others.

Quickstart

Here is a quick usage example based on the trivial machine translation example from the paper 'A maximum entropy approach to natural language processing' by Berger et al., Computational Linguistics, 1996.

Consider the translation of the English word 'in' into French. Assume we notice in a corpus of parallel texts the following facts:

(1)    p(dans) + p(en) + p(à) + p(au cours de) + p(pendant) = 1
(2)    p(dans) + p(en) = 3/10
(3)    p(dans) + p(à)  = 1/2

This code finds the probability distribution with maximal entropy subject to these constraints.

from scipy_maxentropy import Model    # previously scipy.maxentropy

samplespace = ['dans', 'en', 'à', 'au cours de', 'pendant']

def f0(x):
    return x in samplespace

def f1(x):
    return x=='dans' or x=='en'

def f2(x):
    return x=='dans' or x=='à'

f = [f0, f1, f2]

model = Model(f, samplespace)

# Now set the desired feature expectations
b = [1.0, 0.3, 0.5]

model.verbose = False    # set to True to show optimization progress

# Fit the model
model.fit(b)

# Output the distribution
print()
print("Fitted model parameters are:\n" + str(model.params))
print()
print("Fitted distribution is:")
p = model.probdist()
for j in range(len(model.samplespace)):
    x = model.samplespace[j]
    print(f"    x = {x + ':':15s} p(x) = {p[j]:.3f}")

# Now show how well the constraints are satisfied:
print()
print("Desired constraints:")
print("    sum(p(x))           = 1.0")
print("    p['dans'] + p['en'] = 0.3")
print("    p['dans'] + p['à']  = 0.5")
print()
print("Actual expectations under the fitted model:")
print(f"    sum(p(x))           = {p.sum():.3f}")
print(f"    p['dans'] + p['en'] = {p[0] + p[1]:.3f}")
print(f"    p['dans'] + p['à']  = {p[0] + p[2]:.3f}")

Models available

These model classes are available:

  • scipy_maxentropy.Model: for models on discrete, enumerable sample spaces
  • scipy_maxentropy.ConditionalModel: for conditional models on discrete, enumerable sample spaces
  • scipy_maxentropy.BigModel: for models on sample spaces that are either continuous (and perhaps high-dimensional) or discrete but too large to enumerate, like all possible sentences in a natural language. This model uses conditional Monte Carlo methods (primarily importance sampling).

Background

This package fits probabilistic models of the following exponential form:

$$ p(x) = p_0(x) \exp(\theta^T f(x)) / Z(\theta; p_0) $$

with a real parameter vector $\theta$ of the same length $n$ as the feature statistics $f(x) = \left(f_1(x), ..., f_n(x)\right)$.

This is the "closest" model (in the sense of minimizing KL divergence or "relative entropy") to the prior model $p_0$ subject to the following additional constraints on the expectations of the features:

    E f_1(X) = b_1
    ...
    E f_n(X) = b_n

for some constants $b_i$, such as statistics estimated from a dataset.

In the special case where $p_0$ is the uniform distribution, this is the "flattest" model subject to the constraints, in the sense of having maximum entropy.

For more background, see, for example, Cover and Thomas (1991), Elements of Information Theory.

Project details


Release history Release notifications | RSS feed

This version

1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scipy_maxentropy-1.0.tar.gz (27.1 kB view details)

Uploaded Source

Built Distribution

scipy_maxentropy-1.0-py3-none-any.whl (27.7 kB view details)

Uploaded Python 3

File details

Details for the file scipy_maxentropy-1.0.tar.gz.

File metadata

  • Download URL: scipy_maxentropy-1.0.tar.gz
  • Upload date:
  • Size: 27.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.11.8 Darwin/23.2.0

File hashes

Hashes for scipy_maxentropy-1.0.tar.gz
Algorithm Hash digest
SHA256 715413e8233f5689078507ff26228df156f1503e5f0569abeaa22ffdb5244877
MD5 48dcee82214c124f1faccf8f06d13997
BLAKE2b-256 5e3914d67a5ddd03acfc424006c0ad9769fa1ede953e0c72367db6f5e01411ef

See more details on using hashes here.

File details

Details for the file scipy_maxentropy-1.0-py3-none-any.whl.

File metadata

  • Download URL: scipy_maxentropy-1.0-py3-none-any.whl
  • Upload date:
  • Size: 27.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.11.8 Darwin/23.2.0

File hashes

Hashes for scipy_maxentropy-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2cd3c1d95b3aadee3ec1d081e19a5dfec74bd9c2ca8208950c9263110f836174
MD5 ebcdf3f161b9d5248ff30ec4c7299d06
BLAKE2b-256 ad3dfd110f1aafc99a27030bd585fd27829abd88cbe5b73e362f31b244d7f159

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page