Skip to main content

Context tree estimation using the Smallest Maximizer Criterion (SMC)

Project description

Smallest Maximizer Criterion

We introduce a new criterion to select in a consistent way the probabilistic context tree generating a sample. The basic idea is to construct a totally ordered set of candidate trees. This set is composed by the "champion trees", the ones that maximize the likelihood of the sample for each number of degrees of freedom. The smallest maximizer criterion selects the infimum of the subset of champion trees whose gain in likelihood is negligible. In addition, we propose a new algorithm based on resampling to implement this criterion.

This study was motivated by the linguistic challenge of retrieving rhythmic features from written texts. Applied to a data set consisting of texts extracted from daily newspapers, our algorithm identifies different context trees for European Portuguese and Brazilian Portuguese. This is compatible with the long standing conjecture that European Portuguese and Brazilian Portuguese belong to different rhythmic classes. Moreover, these context trees have several interesting properties which are linguistically meaningful.

Requirements

python 3.8

Installation

`pip install -r requirements.txt`

Examples

Estimation by pruning

run python3 examples/estimation_by_pruning.py

Citing

Please cite the following publication when using this algorithm:

Galves, Antonio & Galves, Charlotte & Garcia, Jesus & Garcia, Nancy & Leonardi, Florencia. (2009). Context tree selection and linguistic rhythm retrieval from written texts. The Annals of Applied Statistics. 6. 10.1214/11-AOAS511.

Bibtex version:

@article{article,
author = {Galves, Antonio and Galves, Charlotte and Garcia,
          Jesus and Garcia, Nancy and Leonardi, Florencia},
year = {2009},
month = {02},
pages = {},
title = {Context tree selection and linguistic rhythm retrieval from written
texts},
volume = {6},
journal = {The Annals of Applied Statistics},
doi = {10.1214/11-AOAS511}
}

Running tests

Run pytest -s

License

Acknowledgement

This implementation was produced as part of the activities of FAPESP Research, Innovation and Dissemination Center for Neuromathematics (grant # 2020/04807-0, S.Paulo Research Foundation).

Universidade de São Paulo

Instituto de Matemática e Estatística

Research, Innovation and Dissemination Center for Neuromathematics - NeuroMat

2020

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

g4l-smc-0.0.1.tar.gz (2.1 MB view details)

Uploaded Source

File details

Details for the file g4l-smc-0.0.1.tar.gz.

File metadata

  • Download URL: g4l-smc-0.0.1.tar.gz
  • Upload date:
  • Size: 2.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.0.0.post20201207 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.3

File hashes

Hashes for g4l-smc-0.0.1.tar.gz
Algorithm Hash digest
SHA256 a8960400b52ba157d74ca24ef8dd1876740e8e6029cb17e2fcbf4b3099df00a5
MD5 fd66a145e31281adae89ef694aeb5ef3
BLAKE2b-256 90e6c62f36ca453b1c9625ffac8718f8e18b2a1c6edc9f9afecc937da517cc60

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page