Correlation Dimension for LLMs - A library for computing correlation dimension of autoregressive large language models

These details have not been verified by PyPI

Project links

Project description

CorrDim: Correlation Dimension for Language Models

CorrDim is a Python library for computing the correlation dimension of autoregressive language models from next-token log-probability vectors, based on the paper "Correlation Dimension of Auto-Regressive Large Language Models" (NeurIPS 2025).

Documentation

Full documentation is available at corrdim.readthedocs.io.

Use the docs site for:

installation details and backend notes
the full Python API reference
CLI documentation
examples and usage patterns

What CorrDim measures

Given a text and an autoregressive language model, CorrDim measures the text's global structural complexity as perceived by that model.

In practice:

repetitive or degenerate text tends to have a lower correlation dimension
ordinary fluent text tends to have a higher dimension
richer long-range structure can produce an even higher dimension

CorrDim is complementary to local metrics such as perplexity: it focuses on sequence-level geometry, not just token-level prediction quality.

How it works

At a high level, CorrDim:

converts text into a sequence of next-token log-probability vectors
optionally reduces the vocabulary dimension
computes a correlation-integral curve over epsilon thresholds
estimates the correlation dimension by fitting a line in log-log space

For the mathematical details, see the paper.

Installation

CorrDim requires Python 3.10 or newer. You may use pip or uv to install corrdim.

pip

Linux GPU users: By default, PyPI distributes CPU-only PyTorch on Linux. If you have an NVIDIA GPU, install CUDA PyTorch first. Choose based on your driver version:

CUDA version Min driver Install command

cu126 (default) ≥ 525 pip install torch --index-url https://download.pytorch.org/whl/cu126

cu130 ≥ 580 pip install torch --index-url https://download.pytorch.org/whl/cu130

(For NVIDIA DGX Spark with GB10, use cu130)

CUDA version	Min driver	Install command
cu126 (default)	≥ 525	`pip install torch --index-url https://download.pytorch.org/whl/cu126`
cu130	≥ 580	`pip install torch --index-url https://download.pytorch.org/whl/cu130`

Then, run

pip install corrdim

uv

If using uv, please install PyTorch first before installing corrdim

Linux GPU users: By default, PyPI distributes CPU-only PyTorch on Linux. If you have an NVIDIA GPU, install CUDA PyTorch first. Choose based on your driver version:

CUDA version Min driver Install command

cu126 (default) ≥ 525 uv add torch --index https://download.pytorch.org/whl/cu126

cu130 ≥ 580 uv add torch --index https://download.pytorch.org/whl/cu130

(For NVIDIA DGX Spark with GB10, use cu130)

CUDA version	Min driver	Install command
cu126 (default)	≥ 525	`uv add torch --index https://download.pytorch.org/whl/cu126`
cu130	≥ 580	`uv add torch --index https://download.pytorch.org/whl/cu130`

Then, run

uv add corrdim

Quick start

import torch
import corrdim

result = corrdim.measure_text(
    "Your text here...",
    model="Qwen/Qwen3-0.6B",
    precision=torch.float16,
)

print("corrdim:", result.corrdim)
print("fit_r2:", result.fit_r2)
print("linear_region_bounds:", result.linear_region_bounds)

For batched input:

import torch
import corrdim

results = corrdim.measure_texts(
    [
        "Short sample A...",
        "Short sample B...",
    ],
    model="Qwen/Qwen3-0.6B",
    precision=torch.float16,
)

for result in results:
    print(result.corrdim, result.fit_r2)

Progressive dimension along the sequence

To fit correlation dimension at multiple prefix lengths without re-running the model for each prefix, use measure_text_progressive. It calls progressive_curve_from_text once, then subsamples prefix indices:

skip_prefix_tokens: first prefix index to include (shorter prefixes are skipped)
measure_every_tokens: stride between measured indices, or None (default) to choose from length: fewer than 100 tokens → 1, fewer than 1000 → 10, otherwise 100

The return value is a ProgressiveDimensionResult: by_prefix maps prefix index to a full DimensionResult; corrdims maps index to the fitted scalar only.

import torch
import corrdim

prog_dims = corrdim.measure_text_progressive(
    long_text,
    model="Qwen/Qwen3-0.6B",
    precision=torch.float16,
    skip_prefix_tokens=100,
)

for prefix_len, d in sorted(prog_dims.corrdims.items()):
    print(prefix_len, d)

API overview

The most important entry points are:

measure_text / measure_texts for end-to-end text measurement
measure_text_progressive for multiple fitted dimensions along sequence prefixes (one model pass)
curve_from_text / curve_from_vectors when you want the curve first
estimate_dimension_from_curve when you already have saved curve data
progressive_curve_from_text for prefix-wise analysis
correlation_integral and related functions for lower-level tensor workflows

For full API details, signatures, return types, and backend behavior, see the documentation site.

CLI

CorrDim includes a corrdim command-line interface:

corrdim measure-text \
  --file data/sep60/chaos.txt \
  --model Qwen/Qwen3-0.6B

Additional CLI commands and options are documented at corrdim.readthedocs.io.

Backends

CorrDim supports multiple backends for correlation-integral computation:

triton
pytorch
pytorch_fast
auto

Set the default backend with:

export CORRDIM_CORRINT_BACKEND=pytorch

Or in Python:

import corrdim

print(corrdim.set_corrint_backend("auto"))
print(corrdim.available_corrint_backends())

Tips for systems with limited GPU RAM (e.g., <10GB)

If you run into out-of-memory errors, reduce block_size (default 512) to lower the peak memory usage during correlation-integral computation:

result = corrdim.measure_text(
    text,
    model="Qwen/Qwen3-0.6B
",
    block_size=128,
)

You can also set forward_chunk_size to control how many tokens are processed per forward pass (reduce this value, e.g. 128, on systems with limited GPU RAM):

result = corrdim.measure_text(
    text,
    model="Qwen/Qwen3-0.6B",
    block_size=128,
    forward_chunk_size=128,
)

Citation

@inproceedings{du2025correlation,
  title={Correlation Dimension of Auto-Regressive Large Language Models},
  author={Du, Xin and Tanaka-Ishii, Kumiko},
  booktitle={Advances in Neural Information Processing Systems},
  year={2025},
  arxiv={2510.21258}
}

License

MIT License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.6

May 1, 2026

0.3.5

May 1, 2026

0.3.4

May 1, 2026

0.3.3

Apr 30, 2026

0.3.2

Apr 29, 2026

0.3.1

Apr 29, 2026

0.3.0

Apr 29, 2026

0.2.4

Apr 15, 2026

0.2.3

Apr 7, 2026

0.2.2

Apr 7, 2026

0.2.1

Apr 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

corrdim-0.3.6.tar.gz (39.3 kB view details)

Uploaded May 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

corrdim-0.3.6-py3-none-any.whl (30.0 kB view details)

Uploaded May 1, 2026 Python 3

File details

Details for the file corrdim-0.3.6.tar.gz.

File metadata

Download URL: corrdim-0.3.6.tar.gz
Upload date: May 1, 2026
Size: 39.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.9

File hashes

Hashes for corrdim-0.3.6.tar.gz
Algorithm	Hash digest
SHA256	`848eb34af8ceebb5ece763ae21238237329e2853bada74dd38c97946d67129d6`
MD5	`309a4e6747863bf9ff38b3636843d182`
BLAKE2b-256	`aad6171aed2c799c7c10a57acb26bd28cd8eb5f94f15c330ee83d3cd7c322451`

See more details on using hashes here.

File details

Details for the file corrdim-0.3.6-py3-none-any.whl.

File metadata

Download URL: corrdim-0.3.6-py3-none-any.whl
Upload date: May 1, 2026
Size: 30.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.9

File hashes

Hashes for corrdim-0.3.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cafa2e4becebdd4a57be6b6a20d620103fc39a030a2543f4c5dd9e3a6e7ff88c`
MD5	`e58312a4d67f034181f3195e0ce13a47`
BLAKE2b-256	`a57cffa6594e345c7f0d53f5d915140f6e39339b7b3af88e19dd319d802c196b`

See more details on using hashes here.

corrdim 0.3.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CorrDim: Correlation Dimension for Language Models

Documentation

What CorrDim measures

How it works

Installation

pip

uv

Quick start

Progressive dimension along the sequence

API overview

CLI

Backends

Tips for systems with limited GPU RAM (e.g., <10GB)

Citation

Links

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes