A Python library for scoring observation probabilities from frequency counts, with multiple smoothing methods.

These details have not been verified by PyPI

Project links

Project description

FreqProb

A modern, high-performance Python library for probability smoothing and frequency-based language modeling.

FreqProb provides state-of-the-art smoothing techniques for converting frequency counts into probability estimates, with applications in natural language processing, information retrieval, and statistical modeling.

Comprehensive & Accurate

10+ smoothing methods: From basic Laplace to advanced Kneser-Ney and Simple Good-Turing
Mathematically rigorous: Implementations validated against reference sources (NLTK, SciPy)
Production-ready: Extensive testing with 400+ test cases and property-based validation

High Performance

Vectorized operations: Batch processing with NumPy acceleration
Memory efficient: Compressed representations and streaming algorithms
Lazy evaluation: Compute probabilities only when needed
Caching system: Intelligent memoization for expensive operations

Developer Experience

Type safety: Full type hints with mypy validation
Modern Python: Requires Python 3.10+, uses latest language features
Rich documentation: Mathematical background, tutorials, and API reference
Easy integration: Clean, intuitive API design

Quick Start

Installation

pip install freqprob

For additional features:

pip install freqprob[all]  # All optional dependencies

Basic Usage

import freqprob

# Create a frequency distribution
word_counts = {'the': 100, 'cat': 50, 'dog': 30, 'bird': 10}

# Basic smoothing - handles zero probabilities
laplace = freqprob.Laplace(word_counts, bins=10000)
print(f"P(cat) = {laplace('cat'):.4f}")      # 0.0053
print(f"P(elephant) = {laplace('elephant'):.6f}")  # 0.000105 (unseen word)

# Advanced smoothing for n-gram models
bigrams = {('the', 'cat'): 25, ('the', 'dog'): 20, ('a', 'cat'): 15}
kneser_ney = freqprob.KneserNey(bigrams, discount=0.75)

# Model evaluation
test_data = ['cat', 'dog', 'bird'] * 10
perplexity = freqprob.perplexity(laplace, test_data)
print(f"Perplexity: {perplexity:.2f}")

Smoothing Methods

Basic Methods

MLE (Maximum Likelihood): Unsmoothed relative frequencies
Laplace (Add-One): Classic add-one smoothing
Lidstone (Add-k): Generalized additive smoothing
ELE (Expected Likelihood): Lidstone with γ=0.5

Advanced Methods

Simple Good-Turing: Frequency-of-frequency based smoothing
Kneser-Ney: State-of-the-art for n-gram language models
Modified Kneser-Ney: Improved version with automatic parameter estimation
Bayesian: Dirichlet prior-based smoothing
Interpolated: Linear combination of multiple models

Specialized Features

Streaming algorithms: Real-time updates for large datasets
Memory optimization: Compressed and sparse representations
Performance profiling: Built-in benchmarking and validation tools

Use Cases

Natural Language Processing

# Language modeling
bigrams = freqprob.ngram_frequency(tokens, n=2)
lm = freqprob.KneserNey(bigrams, discount=0.75)

# Text classification with smoothed features
doc_features = freqprob.word_frequency(document_tokens)
classifier_probs = freqprob.Laplace(doc_features, bins=vocab_size)

Information Retrieval

# Document scoring with term frequency smoothing
term_counts = compute_term_frequencies(document)
smoothed_tf = freqprob.BayesianSmoothing(term_counts, alpha=0.5)

# Query likelihood with unseen term handling
query_prob = sum(smoothed_tf(term) for term in query_terms)

Data Science & Analytics

# Probability estimation for sparse categorical data
category_counts = {cat: count for cat, count in data.value_counts().items()}
estimator = freqprob.SimpleGoodTuring(category_counts)

# Handle zero frequencies in statistical analysis
smoothed_dist = freqprob.ELE(observed_frequencies, bins=total_categories)

Quality & Reliability

Rigorous Testing

400+ test cases covering edge cases and normal operations
Property-based testing with Hypothesis for mathematical correctness
Regression testing against reference implementations (NLTK, SciPy)
Numerical stability validation for extreme inputs

Performance Validated

Benchmarking framework for performance regression detection
Memory profiling to ensure efficient resource usage
Scaling analysis from small to large vocabulary sizes
Cross-platform testing on Linux, Windows, and macOS

Mathematical Accuracy

Formula verification against academic literature
Statistical correctness validation with known distributions
Precision testing for floating-point edge cases
Reference compatibility with established libraries

Documentation & Learning

Learn FreqProb through comprehensive, executable tutorials with visualizations. Tutorials are written using Nhandu literate programming format.

Basic Smoothing Methods (View HTML)
- Introduction to probability smoothing
- MLE, Laplace, Lidstone, and ELE methods
- Model evaluation with perplexity
Advanced Methods (View HTML)
- Simple Good-Turing smoothing
- Kneser-Ney and Modified Kneser-Ney
- Bayesian and interpolated methods
Efficiency & Memory (View HTML)
- Vectorized batch processing
- Streaming algorithms
- Memory-efficient representations
Real-World Applications (View HTML)
- Language modeling
- Text classification
- Information retrieval

Citation

If you use FreqProb in academic research, please cite:

@software{tresoldi_freqprob_2025,
  author = {Tresoldi, Tiago},
  title = {FreqProb: A Python library for probability smoothing and frequency-based language modeling},
  url = {https://github.com/tresoldi/freqprob},
  version = {0.4.0},
  publisher = {Department of Linguistics and Philology, Uppsala University},
  address = {Uppsala},
  year = {2025}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.0

Oct 4, 2025

0.3.1

Jul 21, 2025

0.2.0

Jul 13, 2025

0.1.0

Feb 19, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

freqprob-0.4.0.tar.gz (105.4 kB view details)

Uploaded Oct 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

freqprob-0.4.0-py3-none-any.whl (69.4 kB view details)

Uploaded Oct 4, 2025 Python 3

File details

Details for the file freqprob-0.4.0.tar.gz.

File metadata

Download URL: freqprob-0.4.0.tar.gz
Upload date: Oct 4, 2025
Size: 105.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for freqprob-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`763ea9aa78ad1d4db5d5ab57dcce979c89efc8f9f34ed087acfa83176b364cac`
MD5	`56b7072bad1522eee56c71574daf7943`
BLAKE2b-256	`ba88fcba61defa76d6c092aff0d995a88d3e42492a9a3a6f58c301e0489768ee`

See more details on using hashes here.

File details

Details for the file freqprob-0.4.0-py3-none-any.whl.

File metadata

Download URL: freqprob-0.4.0-py3-none-any.whl
Upload date: Oct 4, 2025
Size: 69.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for freqprob-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a357f07b5fd85b04fadb94e0afcf6b62d5ad429cab46d7fe41344beb35dec5c0`
MD5	`782e5f89c514cfee9dbc67abc3ef165b`
BLAKE2b-256	`7d0e90125b55b2a43ea70a4162b3d817c64626d5f36b7ff96927722744424235`

See more details on using hashes here.

freqprob 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

FreqProb

Comprehensive & Accurate

High Performance

Developer Experience

Quick Start

Installation

Basic Usage

Smoothing Methods

Basic Methods

Advanced Methods

Specialized Features

Use Cases

Natural Language Processing

Information Retrieval

Data Science & Analytics

Quality & Reliability

Rigorous Testing

Performance Validated

Mathematical Accuracy

Documentation & Learning

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes