Correlation Dimension for LLMs - A library for computing correlation dimension of autoregressive large language models
Project description
CorrDim: Correlation Dimension for Language Models
CorrDim is a Python library for computing the correlation dimension of autoregressive language models from next-token log-probability vectors, based on the paper "Correlation Dimension of Auto-Regressive Large Language Models" (NeurIPS 2025).
Documentation
Full documentation is available at corrdim.readthedocs.io.
Use the docs site for:
- installation details and backend notes
- the full Python API reference
- CLI documentation
- examples and usage patterns
What CorrDim measures
Given a text and an autoregressive language model, CorrDim measures the text's global structural complexity as perceived by that model.
In practice:
- repetitive or degenerate text tends to have a lower correlation dimension
- ordinary fluent text tends to have a higher dimension
- richer long-range structure can produce an even higher dimension
CorrDim is complementary to local metrics such as perplexity: it focuses on sequence-level geometry, not just token-level prediction quality.
How it works
At a high level, CorrDim:
- converts text into a sequence of next-token log-probability vectors
- optionally reduces the vocabulary dimension
- computes a correlation-integral curve over epsilon thresholds
- estimates the correlation dimension by fitting a line in log-log space
For the mathematical details, see the paper.
Installation
CorrDim requires Python 3.10 or newer.
pip install corrdim
Linux GPU users: PyPI distributes CPU-only PyTorch. Install CUDA PyTorch first:
pip install torch --index-url https://download.pytorch.org/whl/cu126
For local development:
pip install "corrdim[dev,docs]"
To compile the CUDA extension during installation:
CORRDIM_BUILD_CUDA=1 pip install .
Quick start
import torch
import corrdim
result = corrdim.measure_text(
"Your text here...",
model="Qwen/Qwen2.5-1.5B",
precision=torch.float16,
)
print("corrdim:", result.corrdim)
print("fit_r2:", result.fit_r2)
print("linear_region_bounds:", result.linear_region_bounds)
For batched input:
import torch
import corrdim
results = corrdim.measure_texts(
[
"Short sample A...",
"Short sample B...",
],
model="Qwen/Qwen2.5-1.5B",
precision=torch.float16,
)
for result in results:
print(result.corrdim, result.fit_r2)
Progressive dimension along the sequence
To fit correlation dimension at multiple prefix lengths without re-running the model for each prefix, use measure_text_progressive. It calls progressive_curve_from_text once, then subsamples prefix indices:
skip_prefix_tokens: first prefix index to include (shorter prefixes are skipped)measure_every_tokens: stride between measured indices, orNone(default) to choose from length: fewer than 100 tokens →1, fewer than 1000 →10, otherwise100
The return value is a ProgressiveDimensionResult: by_prefix maps prefix index to a full DimensionResult; corrdims maps index to the fitted scalar only.
import torch
import corrdim
prog_dims = corrdim.measure_text_progressive(
long_text,
model="Qwen/Qwen2.5-1.5B",
precision=torch.float16,
skip_prefix_tokens=100,
)
for prefix_len, d in sorted(prog_dims.corrdims.items()):
print(prefix_len, d)
API overview
The most important entry points are:
measure_text/measure_textsfor end-to-end text measurementmeasure_text_progressivefor multiple fitted dimensions along sequence prefixes (one model pass)curve_from_text/curve_from_vectorswhen you want the curve firstestimate_dimension_from_curvewhen you already have saved curve dataprogressive_curve_from_textfor prefix-wise analysiscorrelation_integraland related functions for lower-level tensor workflows
For full API details, signatures, return types, and backend behavior, see the documentation site.
CLI
CorrDim includes a corrdim command-line interface:
corrdim measure-text \
--file data/sep60/chaos.txt \
--model Qwen/Qwen2.5-1.5B
Additional CLI commands and options are documented at corrdim.readthedocs.io.
Backends
CorrDim supports multiple backends for correlation-integral computation:
cudatritonpytorchpytorch_fastauto
Set the default backend with:
export CORRDIM_CORRINT_BACKEND=pytorch
Or in Python:
import corrdim
print(corrdim.set_corrint_backend("auto"))
print(corrdim.available_corrint_backends())
Tips for low-VRAM systems
If you run into out-of-memory errors, reduce block_size (default 512) to lower the peak memory usage during correlation-integral computation:
result = corrdim.measure_text(
text,
model="Qwen/Qwen2.5-1.5B",
block_size=128,
)
You can also set forward_chunk_size to control how many tokens are processed per forward pass (reduce this value, e.g. 128, on systems with limited VRAM):
result = corrdim.measure_text(
text,
model="Qwen/Qwen2.5-1.5B",
block_size=128,
forward_chunk_size=128,
)
Citation
@inproceedings{du2025correlation,
title={Correlation Dimension of Auto-Regressive Large Language Models},
author={Du, Xin and Tanaka-Ishii, Kumiko},
booktitle={Advances in Neural Information Processing Systems},
year={2025},
arxiv={2510.21258}
}
Links
- Documentation: https://corrdim.readthedocs.io
- Paper: https://arxiv.org/abs/2510.21258
- Repository: https://github.com/kduxin/corrdim
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file corrdim-0.3.0.tar.gz.
File metadata
- Download URL: corrdim-0.3.0.tar.gz
- Upload date:
- Size: 44.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73d7bcac0272d3faa81cc752fd2f7623caf073636d77e82244793341955d63ff
|
|
| MD5 |
32fb434ea58d0d2a0a6302e211b84346
|
|
| BLAKE2b-256 |
2f07f9c24a82077094379a5368c854f6c5b083cc1314531055d0608e83ca3367
|
File details
Details for the file corrdim-0.3.0-py3-none-any.whl.
File metadata
- Download URL: corrdim-0.3.0-py3-none-any.whl
- Upload date:
- Size: 35.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f6ac689a7bd440efb61cd7ca29df5a907c98815446444380572e96fd684cc125
|
|
| MD5 |
8f824a626c3d0129e5bc1ab28236d2fb
|
|
| BLAKE2b-256 |
685ec807c5135ee55322e68a269418cd9ce1eb4da40ea35fcd3d6030603518df
|