A Python library for obtaining asymmetric measures of association between categorical variables in data exploration and description
Project description
ASymCat: Asymmetric Categorical Association Analysis
ASymCat is a comprehensive Python library for analyzing asymmetric associations between categorical variables. Unlike traditional symmetric measures that treat relationships as bidirectional, ASymCat provides directional measures that reveal which variable predicts which, making it invaluable for understanding causal relationships, dependencies, and information flow in categorical data.
Key Features
- 17+ Association Measures: From basic MLE to advanced information-theoretic measures
- Directional Analysis: X→Y vs Y→X asymmetric relationship quantification
- Robust Smoothing: FreqProb integration for numerical stability
- Multiple Data Formats: Sequences, presence-absence matrices, n-grams
- Scalable Architecture: Optimized for large datasets with efficient algorithms
Why Asymmetric Measures Matter
Traditional measures like Pearson's χ² or Cramér's V treat associations as symmetric: the relationship between X and Y is the same as between Y and X. However, many real-world relationships are inherently directional:
- Linguistics: Phoneme transitions may be predictable in one direction but not the other
- Ecology: Species presence may predict other species asymmetrically
- Market Research: Product purchases may show directional dependencies
- Medical Analysis: Symptoms may predict conditions more reliably than vice versa
ASymCat quantifies these directional relationships, revealing hidden patterns that symmetric measures miss.
Quick Example
import asymcat
# Load your categorical data
data = asymcat.read_sequences("data.tsv") # or read_pa_matrix() for binary data
# Collect co-occurrences
cooccs = asymcat.collect_cooccs(data)
# Create scorer and analyze
scorer = asymcat.CatScorer(cooccs)
# Get asymmetric measures
mle_scores = scorer.mle() # Maximum likelihood estimation
pmi_scores = scorer.pmi() # Pointwise mutual information
chi2_scores = scorer.chi2() # Chi-square with smoothing
fisher_scores = scorer.fisher() # Fisher exact test
# Each returns {(x, y): (x→y_score, y→x_score)}
print(f"A→B: {mle_scores[('A', 'B')][0]:.3f}")
print(f"B→A: {mle_scores[('A', 'B')][1]:.3f}")
Installation
From PyPI (Recommended)
pip install asymcat
From Source
git clone https://github.com/tresoldi/asymcat.git
cd asymcat
pip install -e ".[dev]" # Install with all optional dependencies
Documentation & Resources
ASymCat provides comprehensive documentation organized for different needs:
Core Documentation
| Document | Purpose | Audience |
|---|---|---|
| User Guide | Conceptual foundations, theory, best practices | Everyone - start here |
| API Reference | Complete technical API documentation | Developers |
| LLM Documentation | Quick integration and code patterns | AI agents, rapid development |
Progressive Interactive Tutorials
Learn ASymCat through hands-on Nhandu tutorials with executable code and visualizations:
Tutorial 1: Basics
Foundation - Get started with asymmetric analysis 📄 Python source | 🌐 View HTML
- What are asymmetric associations and why they matter
- Basic workflow: load → collect → score
- Simple measures (MLE, PMI, Jaccard)
- Working with sequences and presence-absence data
Tutorial 2: Advanced Measures
Depth - Master all 17+ association measures 📄 Python source | 🌐 View HTML
- Information-theoretic measures (PMI, NPMI, Theil's U)
- Statistical measures (Chi-square, Cramér's V, Fisher)
- Smoothing methods and their effects
- Measure selection decision tree
Tutorial 3: Visualization
Communication - Create publication-quality figures 📄 Python source | 🌐 View HTML
- Heatmap visualizations of association matrices
- Score distribution and asymmetry plots
- Matrix transformations (scaling, inversion)
- Multi-measure comparison panels
Tutorial 4: Real-World Applications
Application - Complete analysis workflows 📄 Python source | 🌐 View HTML
- Linguistics: Grapheme-phoneme correspondence analysis
- Ecology: Galápagos finch species co-occurrence patterns
- Machine Learning: Feature selection with asymmetric measures
- Interpretation best practices and reporting strategies
All tutorials are fully executed with committed outputs - view the HTML files online via the links above, or run the Python source files locally to explore and modify. Generate fresh documentation with
make docs.
Additional Resources
- Documentation Index: Complete navigation guide
- CHANGELOG: Version history and migration guides
Association Measures
ASymCat implements 17+ association measures organized by type:
Probabilistic Measures
- MLE: Maximum Likelihood Estimation - P(X|Y) and P(Y|X)
- Jaccard Index: Set overlap with asymmetric interpretation
Information-Theoretic Measures
- PMI: Pointwise Mutual Information (log P(X,Y)/P(X)P(Y))
- PMI Smoothed: Numerically stable PMI with FreqProb smoothing
- NPMI: Normalized PMI [-1, 1] range
- Mutual Information: Average information shared
- Conditional Entropy: Information remaining after observing condition
Statistical Measures
- Chi-Square: Pearson's χ² with optional smoothing
- Cramér's V: Normalized chi-square association
- Fisher Exact: Exact odds ratios for small samples
- Log-Likelihood Ratio: G² statistic
Specialized Measures
- Theil's U: Uncertainty coefficient (entropy-based)
- Tresoldi: Custom measure designed for sequence alignment
- Goodman-Kruskal λ: Proportional reduction in error
Data Formats
Sequence Data (TSV)
# linguistic_data.tsv
sound_from sound_to
p a t a B A T A
k a t a G A T A
Presence-Absence Matrix (TSV)
# species_data.tsv
site species_A species_B species_C
island_1 1 0 1
island_2 1 1 0
N-gram Support
# Automatic n-gram extraction
bigrams = asymcat.collect_cooccs(data, order=2, pad="#")
trigrams = asymcat.collect_cooccs(data, order=3, pad="#")
Citation
The library is developed by Tiago Tresoldi (tiago.tresoldi@lingfil.uu.se). The library is developed in the context of the Cultural Evolution of Texts project, with funding from the Riksbankens Jubileumsfond (grant agreement ID: MXM19-1087:1).
During the first stages of development, the author received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. ERC Grant #715618, Computer-Assisted Language Comparison).
If you use ASymCat in your research, please cite:
@software{tresoldi_asymcat_2025,
title = {ASymCat: Asymmetric Categorical Association Analysis},
author = {Tresoldi, Tiago},
address = {Uppsala},
publisher = {Department of Linguistics and Philology, Uppsala University},
year = {2025},
url = {https://github.com/tresoldi/asymcat},
version = {0.4.0}
}
🔮 Roadmap
- Statistical Significance: P-value calculations for all measures
- Confidence Intervals: Uncertainty quantification
- GPU Acceleration: CUDA support for massive datasets
- Interactive Dashboards: Web-based exploration tools
- Extended Measures: Additional domain-specific association metrics
- Nhandu Documentation: Migration to modern documentation system
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file asymcat-0.4.0.tar.gz.
File metadata
- Download URL: asymcat-0.4.0.tar.gz
- Upload date:
- Size: 112.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20f28c82bbf5a95b4895b6d2ef8c3cd29d3e9d04707d957797ed9c8120cfe86f
|
|
| MD5 |
e0a8b44c1a544a4c98032eb5f0e3a246
|
|
| BLAKE2b-256 |
4ea6af4d563c8f7b800afba8dfb1dc9b68519d3f4fa8ea3a7b36f5f11dc68590
|
File details
Details for the file asymcat-0.4.0-py3-none-any.whl.
File metadata
- Download URL: asymcat-0.4.0-py3-none-any.whl
- Upload date:
- Size: 22.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f152cecd25d7e891d3fc12e422073c6f5f3d0b6c6ef09b29ff41d989be515cc
|
|
| MD5 |
962bee5929a28044b2f87982d30b75b5
|
|
| BLAKE2b-256 |
bc8cab4f1ec7403c8b5fda4a76b82b25dc957b4201ebb76939d9018cc5fc2c9a
|