Skip to main content

A Python library for obtaining asymmetric measures of association between categorical variables in data exploration and description

Project description

ASymCat: Asymmetric Categorical Association Analysis

PyPI version Python versions Code Quality codecov Ruff License: MIT

ASymCat is a comprehensive Python library for analyzing asymmetric associations between categorical variables. Unlike traditional symmetric measures that treat relationships as bidirectional, ASymCat provides directional measures that reveal which variable predicts which, making it invaluable for understanding causal relationships, dependencies, and information flow in categorical data.

Key Features

  • 17+ Association Measures: From basic MLE to advanced information-theoretic measures
  • Directional Analysis: X→Y vs Y→X asymmetric relationship quantification
  • Robust Smoothing: FreqProb integration for numerical stability
  • Multiple Data Formats: Sequences, presence-absence matrices, n-grams
  • Scalable Architecture: Optimized for large datasets with efficient algorithms

Why Asymmetric Measures Matter

Traditional measures like Pearson's χ² or Cramér's V treat associations as symmetric: the relationship between X and Y is the same as between Y and X. However, many real-world relationships are inherently directional:

  • Linguistics: Phoneme transitions may be predictable in one direction but not the other
  • Ecology: Species presence may predict other species asymmetrically
  • Market Research: Product purchases may show directional dependencies
  • Medical Analysis: Symptoms may predict conditions more reliably than vice versa

ASymCat quantifies these directional relationships, revealing hidden patterns that symmetric measures miss.

Quick Example

import asymcat

# Load your categorical data
data = asymcat.read_sequences("data.tsv")  # or read_pa_matrix() for binary data

# Collect co-occurrences  
cooccs = asymcat.collect_cooccs(data)

# Create scorer and analyze
scorer = asymcat.CatScorer(cooccs)

# Get asymmetric measures
mle_scores = scorer.mle()           # Maximum likelihood estimation
pmi_scores = scorer.pmi()           # Pointwise mutual information  
chi2_scores = scorer.chi2()         # Chi-square with smoothing
fisher_scores = scorer.fisher()     # Fisher exact test

# Each returns {(x, y): (x→y_score, y→x_score)}
print(f"A→B: {mle_scores[('A', 'B')][0]:.3f}")
print(f"B→A: {mle_scores[('A', 'B')][1]:.3f}")

Installation

From PyPI (Recommended)

pip install asymcat

From Source

git clone https://github.com/tresoldi/asymcat.git
cd asymcat
pip install -e ".[dev]"  # Install with all optional dependencies

Documentation & Resources

ASymCat provides comprehensive documentation organized for different needs:

Core Documentation

Document Purpose Audience
User Guide Conceptual foundations, theory, best practices Everyone - start here
API Reference Complete technical API documentation Developers
LLM Documentation Quick integration and code patterns AI agents, rapid development

Progressive Interactive Tutorials

Learn ASymCat through hands-on Nhandu tutorials with executable code and visualizations:

Tutorial 1: Basics

Foundation - Get started with asymmetric analysis 📄 Python source | 🌐 View HTML

  • What are asymmetric associations and why they matter
  • Basic workflow: load → collect → score
  • Simple measures (MLE, PMI, Jaccard)
  • Working with sequences and presence-absence data

Tutorial 2: Advanced Measures

Depth - Master all 17+ association measures 📄 Python source | 🌐 View HTML

  • Information-theoretic measures (PMI, NPMI, Theil's U)
  • Statistical measures (Chi-square, Cramér's V, Fisher)
  • Smoothing methods and their effects
  • Measure selection decision tree

Tutorial 3: Visualization

Communication - Create publication-quality figures 📄 Python source | 🌐 View HTML

  • Heatmap visualizations of association matrices
  • Score distribution and asymmetry plots
  • Matrix transformations (scaling, inversion)
  • Multi-measure comparison panels

Tutorial 4: Real-World Applications

Application - Complete analysis workflows 📄 Python source | 🌐 View HTML

  • Linguistics: Grapheme-phoneme correspondence analysis
  • Ecology: Galápagos finch species co-occurrence patterns
  • Machine Learning: Feature selection with asymmetric measures
  • Interpretation best practices and reporting strategies

All tutorials are fully executed with committed outputs - view the HTML files online via the links above, or run the Python source files locally to explore and modify. Generate fresh documentation with make docs.

Additional Resources

Association Measures

ASymCat implements 17+ association measures organized by type:

Probabilistic Measures

  • MLE: Maximum Likelihood Estimation - P(X|Y) and P(Y|X)
  • Jaccard Index: Set overlap with asymmetric interpretation

Information-Theoretic Measures

  • PMI: Pointwise Mutual Information (log P(X,Y)/P(X)P(Y))
  • PMI Smoothed: Numerically stable PMI with FreqProb smoothing
  • NPMI: Normalized PMI [-1, 1] range
  • Mutual Information: Average information shared
  • Conditional Entropy: Information remaining after observing condition

Statistical Measures

  • Chi-Square: Pearson's χ² with optional smoothing
  • Cramér's V: Normalized chi-square association
  • Fisher Exact: Exact odds ratios for small samples
  • Log-Likelihood Ratio: G² statistic

Specialized Measures

  • Theil's U: Uncertainty coefficient (entropy-based)
  • Tresoldi: Custom measure designed for sequence alignment
  • Goodman-Kruskal λ: Proportional reduction in error

Data Formats

Sequence Data (TSV)

# linguistic_data.tsv
sound_from	sound_to
p a t a	B A T A
k a t a	G A T A

Presence-Absence Matrix (TSV)

# species_data.tsv
site	species_A	species_B	species_C
island_1	1	0	1
island_2	1	1	0

N-gram Support

# Automatic n-gram extraction
bigrams = asymcat.collect_cooccs(data, order=2, pad="#")
trigrams = asymcat.collect_cooccs(data, order=3, pad="#")

Citation

The library is developed by Tiago Tresoldi (tiago.tresoldi@lingfil.uu.se). The library is developed in the context of the Cultural Evolution of Texts project, with funding from the Riksbankens Jubileumsfond (grant agreement ID: MXM19-1087:1).

During the first stages of development, the author received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. ERC Grant #715618, Computer-Assisted Language Comparison).

If you use ASymCat in your research, please cite:

@software{tresoldi_asymcat_2025,
  title = {ASymCat: Asymmetric Categorical Association Analysis},
  author = {Tresoldi, Tiago},
  address = {Uppsala},
  publisher = {Department of Linguistics and Philology, Uppsala University},
  year = {2025},
  url = {https://github.com/tresoldi/asymcat},
  version = {0.4.0}
}

🔮 Roadmap

  • Statistical Significance: P-value calculations for all measures
  • Confidence Intervals: Uncertainty quantification
  • GPU Acceleration: CUDA support for massive datasets
  • Interactive Dashboards: Web-based exploration tools
  • Extended Measures: Additional domain-specific association metrics
  • Nhandu Documentation: Migration to modern documentation system

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asymcat-0.4.0.tar.gz (112.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

asymcat-0.4.0-py3-none-any.whl (22.9 kB view details)

Uploaded Python 3

File details

Details for the file asymcat-0.4.0.tar.gz.

File metadata

  • Download URL: asymcat-0.4.0.tar.gz
  • Upload date:
  • Size: 112.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for asymcat-0.4.0.tar.gz
Algorithm Hash digest
SHA256 20f28c82bbf5a95b4895b6d2ef8c3cd29d3e9d04707d957797ed9c8120cfe86f
MD5 e0a8b44c1a544a4c98032eb5f0e3a246
BLAKE2b-256 4ea6af4d563c8f7b800afba8dfb1dc9b68519d3f4fa8ea3a7b36f5f11dc68590

See more details on using hashes here.

File details

Details for the file asymcat-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: asymcat-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 22.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for asymcat-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5f152cecd25d7e891d3fc12e422073c6f5f3d0b6c6ef09b29ff41d989be515cc
MD5 962bee5929a28044b2f87982d30b75b5
BLAKE2b-256 bc8cab4f1ec7403c8b5fda4a76b82b25dc957b4201ebb76939d9018cc5fc2c9a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page