A Python package for explaining text similarity

These details have not been verified by PyPI

Project links

Homepage

Operating System
- OS Independent
Programming Language

Project description

XPLAINSIM: A Toolkit for Explaining Text Similarity

A research toolkit for decomposing and explaining text similarity across neural, structured, and symbolic levels. It's designed for interpretability research, controlled embedding and metric alignment, and hybrid neural-symbolic text analysis.

The toolkit is modular: each explanation paradigm can be used independently or combined in hybrid setups.

Conceptual Overview

XPLAINSIM currently provides three complementary explanation paradigms:

Module	Explanation Level	What it Does
Attribution	Token level	Explain which tokens drive similarity
SpaceShaping	Embedding space	Shape features to encode custom aspects
Symbolic	Graph level	Explain which semantic roles/aspects align

Overview of Repository / Table of Contents

Installation
Attributions
- Idea
- Examples
Space Shaping
- Idea
- Examples
Symbolic
- Idea
- Example
FAQ
Citation

Installation

You can install via pip with:

pip install xplainsim

That's it. Only when using the Symbolic module with the default parser one small extra installation is necessary.

Attributions

Idea

Token-level attribution decomposes embedding similarity into fine-grained token interactions between two texts.

Given a neural embedding model and two texts we trace the similarity back to interactions of individual input tokens.

The explanation is a matrix over the tokens from each input (the sum of this matrix approximates the similarity of the embeddings).

Example

Show Currently Available Models

from xplain.attribution import ModelFactory
print(ModelFactory.show_options()) # shows available model names, use in build below

Compute Attributions

from xplain.attribution import ModelFactory
model = ModelFactory.build("sentence-transformers/all-mpnet-base-v2") # use print(ModelFactory.show_options()) to show others
texta = 'The dog runs after the kitten in the yard.'
textb = 'Outside in the garden the cat is chased by the dog.'
A, tokens_a, tokens_b = model.explain_similarity(texta, textb, move_to_cpu=True, sim_measure='cosine')

Example output structure:

A: token-level contribution matrix
tokens_a: token list for text A
tokens_b: token list for text B

Expansion: Token Alignment

# same as above, then
A, tokens_a, tokens_b = model.postprocess_attributions(A, tokens_a, tokens_b, sparsification_method="FlowAlign")

Expansion: Cross-Linguality

from xplain.attribution import ModelFactory
model = ModelFactory.build("Alibaba-NLP/gte-multilingual-base") # use print(ModelFactory.show_options()) to show others
texta = 'The dog runs after the kitten in the yard.'
textb = 'Im Garten rennt der Hund der Katze hinterher.'
A, tokens_a, tokens_b = model.explain_similarity(texta, textb, move_to_cpu=True, sim_measure='cosine')

Space Shaping

Idea

Space Shaping enforces interpretable structure inside embedding spaces.

Instead of learning a monolithic embedding, the vector is partitioned into dedicated subspaces, each trained to reflect a predefined interpretable metric (e.g., bag-of-words overlap, named entity similarity, sentiment, etc.).

This enables:

Controllable similarity decomposition
Feature-aligned embeddings
Hybrid symbolic–neural objectives

from sentence_transformers import InputExample
from xplain.spaceshaping import PartitionedSentenceTransformer

examples = []

# compute the training/partitioning target
for x, y in zip(list_with_strings, other_list_with_strings):
	similarities = []
        # Metrics/aspects that should be reflected in the embedding space
	for metric in my_metrics:
		similarities.append(metric.score(x, y))
	examples.append(InputExample(texts=[x, y], label=similarities))

# instantiate model and train
pt = PartitionedSentenceTransformer(feature_names, feature_dims)

pt.train_model(examples)

Space Partitioning Example

Here's a very simple example for training and inferring with a custom model.

Concretely, we partition the embedding into three features/parts

Bag-of-words: Learns to reflect bag-of-words distance
Named entity similarity: Learns to reflect similarity of named entities
(Not explicitly trained): Residual features for capturing the semantic similarity that makes for "the rest"

Note that this is only a toy code, and the training happens on little data, however, the feature partitioning will already have some effect.

from scipy.stats import pearsonr
from xplain.spaceshaping import PartitionedSentenceTransformer
from sentence_transformers import InputExample
from datasets import load_dataset

# We will later use this to create a custom "Named Entity" metric
import spacy
nlp=spacy.load("en_core_web_sm")

# let's first load a toy train dataset of sentence pairs
ds = load_dataset("mteb/stsbenchmark-sts")
some_pairs = list(zip([dic["sentence1"] for dic in ds["train"]], [dic["sentence2"] for dic in ds["train"]]))

# dev dataset of sentence pairs
some_pairs_dev = list(zip([dic["sentence1"] for dic in ds["validation"]], [dic["sentence2"] for dic in ds["validation"]]))

# let's build our target metrics that should be reflected within the embedding space,
def bow_sim(x1, x2):
	x1, x2 = set(x1.split()), set(x2.split())
	inter, union = x1.intersection(x2), x1.union(x2)
	return len(inter) / len(union)

def ner_sim(doc1, doc2):
	x1_ner = " ".join([ne.text for ne in doc1.ents])
	x2_ner = " ".join([ne.text for ne in doc2.ents])
	if not x1_ner and not x2_ner:
		return 1.0
	return bow_sim(x1_ner, x2_ner)

# we create training examples
docs1, docs2 = [nlp(x) for x, _ in some_pairs], [nlp(y) for _, y in some_pairs]
target = [[bow_sim(x1, x2), ner_sim(docs1[i], docs2[i])] for i, (x1, x2) in enumerate(some_pairs)]
some_examples = [InputExample(texts=[x1, x2], label=target[i]) for (i, (x1, x2)) in enumerate(some_pairs)]

# some development examples
docs1_dev, docs2_dev = [nlp(x) for x, _ in some_pairs_dev], [nlp(y) for _, y in some_pairs_dev]
target_dev = [[bow_sim(x1, x2), ner_sim(docs1_dev[i], docs2_dev[i])] for i, (x1, x2) in enumerate(some_pairs_dev)]
some_examples_dev = [InputExample(texts=[x1, x2], label=target_dev[i]) for (i, (x1, x2)) in enumerate(some_pairs_dev)]

# initialize model
pt = PartitionedSentenceTransformer(feature_names=["bow", "ner"], feature_dims=[32, 32])

# explanation can be called before training, but it's meaningless, just to compare to later
decomposed_predictions = pt.explain_similarity([x for x, y in some_pairs_dev], [y for x, y in some_pairs_dev])

def feature_correlation(feature_name, preds):
    return pearsonr([dic[feature_name] for dic in preds],
                    [ex.label[pt.feature_names.index(feature_name)] for ex in some_examples_dev])[0]

pearsonr_before_training = [feature_correlation(name, decomposed_predictions) for name in pt.feature_names]

# print a toy example before training
print("Text before training", pt.explain_similarity(["The kitten drinks milk"], ["A cat slurps something"]))

# train
pt.train_model(some_examples, some_examples_dev)

# eval correlation to custom metric after train
decomposed_predictions = pt.explain_similarity([x for x, y in some_pairs_dev], [y for x, y in some_pairs_dev])
pearsonr_after_training = [feature_correlation(name, decomposed_predictions) for name in pt.feature_names]

for index, pr in enumerate(pearsonr_after_training):
    print(f"Correlation for {pt.feature_names[index]} delta: {pr - pearsonr_before_training[index]}")

# print a toy example after training
print("Text after training:", pt.explain_similarity(["The kitten drinks milk"], ["A cat slurps something"]))

Symbolic

Idea

Unlike pure neural similarity, this approach decomposes similarity along semantic roles (Agent, Patient, Negation, etc.), enabling aspect-level semantic comparison.

This is based on comparing AMR graphs of texts. Abstract Meaning Representation (AMR) encodes sentence meaning as a graph of concepts and semantic roles.

Installation note:

For using the Symbolic module with the default parser small extra installations are necessary:

!pip install amrlib
!xplain-install-amr
!pip install transformers[torch]==4.49.0

The last line is to ensure that an older transformer version (transformers<5) is installed, as the default AMR parser is not yet compatible with version 5.

Example

Explaining Similarity

The approach consists roughly in two steps:

Parse each input text to an AMR Graph that expresses the text semantics in a symbolic way
Match those Meaning Graphs with Graph Similarity Metrics to elicit meaning similarity aspects (e.g., Agent, Patient, Negation,...)

from xplain.symbolic.model import AMRSimilarity
explainer = AMRSimilarity()
sents1 = ["Barack Obama holds a talk"]
sents2 = ["Hillary Clinton holds a talk"]
exp = explainer.explain_similarity(sents1, sents2)
print(exp)

This will print a json dictionary with aspectual graph matching scores.

Return AMR graphs And Graph Alignments

To also return the graphs and aspectual subgraphs (including node alignments), use return_graphs=True in explain_similarity.

FAQ

Citation

Project details

These details have not been verified by PyPI

Project links

Homepage

Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

This version

0.9.4

Mar 20, 2026

0.9.3

Mar 19, 2026

0.9.2

Mar 11, 2026

0.9.1

Mar 9, 2026

0.9

Mar 9, 2026

0.0.2

Mar 27, 2025

0.0.1

Mar 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xplainsim-0.9.4.tar.gz (40.6 kB view details)

Uploaded Mar 20, 2026 Source

File details

Details for the file xplainsim-0.9.4.tar.gz.

File metadata

Download URL: xplainsim-0.9.4.tar.gz
Upload date: Mar 20, 2026
Size: 40.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for xplainsim-0.9.4.tar.gz
Algorithm	Hash digest
SHA256	`3432bf52b4b07d910b604e9e0ee1c47dad0f5f9e6f1b87924eac011dca8f1d44`
MD5	`b5209a399849ad8c3111803159867440`
BLAKE2b-256	`eaf337f7284881a98a384a967c351dbcccf36ae583c03519332b276462190c27`

See more details on using hashes here.

xplainsim 0.9.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

XPLAINSIM: A Toolkit for Explaining Text Similarity

Conceptual Overview

Overview of Repository / Table of Contents

Installation

Attributions

Idea

Example

Show Currently Available Models

Compute Attributions

Expansion: Token Alignment

Expansion: Cross-Linguality

Space Shaping

Idea

Space Partitioning Example

Symbolic

Idea

Example

Explaining Similarity

Return AMR graphs And Graph Alignments

FAQ

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes