Skip to main content

Correlation Dimension for LLMs - A library for computing correlation dimension of autoregressive large language models

Project description

CorrDim: Correlation Dimension for Language Models

CorrDim is a Python library for computing the correlation dimension of autoregressive language models from next-token log-probability vectors, based on the paper "Correlation Dimension of Auto-Regressive Large Language Models" (NeurIPS 2025).

Documentation

Full documentation is available at corrdim.readthedocs.io.

Use the docs site for:

  • installation details and backend notes
  • the full Python API reference
  • CLI documentation
  • examples and usage patterns

What CorrDim measures

Given a text and an autoregressive language model, CorrDim measures the text's global structural complexity as perceived by that model.

In practice:

  • repetitive or degenerate text tends to have a lower correlation dimension
  • ordinary fluent text tends to have a higher dimension
  • richer long-range structure can produce an even higher dimension

CorrDim is complementary to local metrics such as perplexity: it focuses on sequence-level geometry, not just token-level prediction quality.

How it works

At a high level, CorrDim:

  1. converts text into a sequence of next-token log-probability vectors
  2. optionally reduces the vocabulary dimension
  3. computes a correlation-integral curve over epsilon thresholds
  4. estimates the correlation dimension by fitting a line in log-log space

For the mathematical details, see the paper.

Installation

CorrDim requires Python 3.10 or newer.

pip install corrdim

If you want to avoid Triton installation:

pip install "corrdim[no-triton]"

For local development:

pip install "corrdim[dev,docs]"

To compile the CUDA extension during installation:

CORRDIM_BUILD_CUDA=1 pip install .

Quick start

import torch
import corrdim

result = corrdim.measure_text(
    "Your text here...",
    model="Qwen/Qwen2.5-1.5B",
    precision=torch.float16,
)

print("corrdim:", result.corrdim)
print("fit_r2:", result.fit_r2)
print("linear_region_bounds:", result.linear_region_bounds)

For batched input:

import torch
import corrdim

results = corrdim.measure_texts(
    [
        "Short sample A...",
        "Short sample B...",
    ],
    model="Qwen/Qwen2.5-1.5B",
    precision=torch.float16,
)

for result in results:
    print(result.corrdim, result.fit_r2)

API overview

The most important entry points are:

  • measure_text / measure_texts for end-to-end text measurement
  • curve_from_text / curve_from_vectors when you want the curve first
  • estimate_dimension_from_curve when you already have saved curve data
  • progressive_curve_from_text for prefix-wise analysis
  • correlation_integral and related functions for lower-level tensor workflows

For full API details, signatures, return types, and backend behavior, see the documentation site.

CLI

CorrDim includes a corrdim command-line interface:

corrdim measure-text \
  --file data/sep60/chaos.txt \
  --model Qwen/Qwen2.5-1.5B

Additional CLI commands and options are documented at corrdim.readthedocs.io.

Backends

CorrDim supports multiple backends for correlation-integral computation:

  • cuda
  • triton
  • pytorch
  • pytorch_fast
  • auto

Set the default backend with:

export CORRDIM_CORRINT_BACKEND=pytorch

Or in Python:

import corrdim

print(corrdim.set_corrint_backend("auto"))
print(corrdim.available_corrint_backends())

Citation

@inproceedings{du2025correlation,
  title={Correlation Dimension of Auto-Regressive Large Language Models},
  author={Du, Xin and Tanaka-Ishii, Kumiko},
  booktitle={Advances in Neural Information Processing Systems},
  year={2025},
  arxiv={2510.21258}
}

Links

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

corrdim-0.2.1.tar.gz (36.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

corrdim-0.2.1-py3-none-any.whl (29.9 kB view details)

Uploaded Python 3

File details

Details for the file corrdim-0.2.1.tar.gz.

File metadata

  • Download URL: corrdim-0.2.1.tar.gz
  • Upload date:
  • Size: 36.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Anolis OS","version":"8.6","id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for corrdim-0.2.1.tar.gz
Algorithm Hash digest
SHA256 9195589d44233fc8eb11827022629802be5d888c6d1160efc01921175b20a5d0
MD5 bf92a2d6a4ba6219c1fedc706c9a4be2
BLAKE2b-256 ca5164775d1d231661a5f54c9a9ee05f2a4e8bfe8c19bec73587b588776a965b

See more details on using hashes here.

File details

Details for the file corrdim-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: corrdim-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 29.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Anolis OS","version":"8.6","id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for corrdim-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0f31885b577049afcf2557f453ea5bb89a8d87e584a3b9aeb9f33c8b54d46583
MD5 eb6865ac4d3ad377431582b7cb29347a
BLAKE2b-256 5cd07fea7a07e9fbdbaa29dd69a223bcb74e29e8512cb84c427b29f605f51c95

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page