A python package for multi-modal entity resolution using the Fusion algorithm.

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Fusion- Flexible Unification of Structured Intermodal Object Networks

Fusion is a Python package that provides solutions to the entity resolution in multimodal graphs problem. It implements the Fusion algorithm from the paper "Fusion: Flexible Unification of Structured Intermodal Object Networks" by Yoel Ashkenazi and Yoram Louzoun.

Installation
Quick Tour
Directory Tree
Evaluation
Plotting Graphs
Configuration File Example
Main Git Repository

Installation

To install the package and its dependencies, run:

pip install Fusion

Make sure you have Python 3.8+ installed.

Quick Tour

Directory Tree

project_root/
│
├── Fusion/
│   └── main.py
├── Entity_detection/
│   ├── my_algorithm.py
│   └── Record_linkage/
│       └── RL_test.py
├── evaluate.py
├── utils.py
├── data/
│   └── graph.gpickle
├── output/
│   ├── DatasetName_results.json
│   └── DatasetName_colored_graph.gpickle
├── requirements.txt
└── config.json

Fusion/main.py: Main entry point.
Entity_detection/: Contains model and record linkage code.
evaluate.py: Evaluation functions.
utils.py: Utility functions (drawing, graph manipulation).
data/: Place your .gpickle graph files here.
output/: Results and colored graphs are saved here.

Running the Fusion Model or Record Linkage Test Use the main script to run the fusion process or record linkage test:

python Fusion/main.py --config path/to/config.json --output path/to/output_folder

--config: Path to your configuration file (see Configuration File Example).
--output: Directory where results and colored graphs will be saved. Evaluating Results After running the model, results are saved as a pickle file in your output directory.

Evaluation

To evaluate the partition, use the get_truth_values function from evaluate.py:

from evaluate import get_truth_values

TP, FP, TN, FN = get_truth_values(graph, true_graph, partition, true_entities)

Explanation of Metrics:

True Positives (TP): Vertices correctly grouped.
False Positives (FP): Vertices incorrectly grouped (should not be in the partition).
True Negatives (TN): Vertices correctly not grouped.
False Negatives (FN): Vertices that should have been grouped but were not.

Calculating Performance Metrics: Using the above values, you can calculate:

Precision: TP / (TP + FP) - Measures the accuracy of positive predictions.
Recall: TP / (TP + FN) - Measures the ability to find all positive instances.
F1-Score: 2 * (Precision * Recall) / (Precision + Recall) - Harmonic mean of precision and recall.

Example:

precision = TP / (TP + FP) if (TP + FP) > 0 else 0
recall = TP / (TP + FN) if (TP + FN) > 0 else 0
f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0

print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F1-Score: {f1_score}")

Saving Results:

The results, including metrics, are saved as a JSON file in the output directory. For example:

import json

results = {
    'TP': TP,
    'FP': FP,
    'TN': TN,
    'FN': FN,
    'precision': precision,
    'recall': recall,
    'F1_score': f1_score,
}

with open('output/results.json', 'w') as f:
    json.dump(results, f, indent=4)

This ensures you can review and analyze the evaluation metrics later.

Plotting Graphs

To visualize the partitioned graph, use the draw method from utils.py. Below are examples of how to use it:

Example 1: Plotting a Colored Graph

import utils
import networkx as nx

# Load the true graph and partition
true_graph = utils.load_dataset("data/graph.gpickle")
partition = {"node1": 0, "node2": 1, "node3": 0}  # Example partition

# Color the graph by partition
colored_graph = utils.color_by_partition(true_graph, partition)

# Plot the graph
utils.draw(colored_graph)

This will display the graph with nodes colored according to their partition.

Example 2: Saving the Colored Graph

import pickle as pkl

# Save the colored graph
with open("output/colored_graph.gpickle", "wb") as f:
    pkl.dump(colored_graph, f, protocol=pkl.HIGHEST_PROTOCOL)

print("Colored graph saved to output/colored_graph.gpickle")

Example 3: Plotting with Custom Layout

import matplotlib.pyplot as plt

# Use a spring layout for better visualization
pos = nx.spring_layout(colored_graph)

# Draw the graph with the custom layout
utils.draw(colored_graph, pos=pos)

# Show the plot
plt.show()

Configuration File Example

Below is an example of a configuration file (config.json).

Note:

graph_path must point to a .gpickle file.
Parameters like blue_in, red_out, C, etc., affect the model.
Parameters like test type, add_num, remove_num are for execution.

{
    "verbosity_level": 1,           // int: Logging level (0 = silent, 1 = basic info, 2 = detailed debug info)
    "draw": false,                  // bool: Whether to plot graphs during execution
    "blue_in": 1.0,                 // float: Weight for blue intra-cluster edges
    "blue_out": 1.0,                // float: Weight for blue inter-cluster edges
    "red_in": 1.0,                  // float: Weight for red intra-cluster edges
    "red_out": 1.0,                 // float: Weight for red inter-cluster edges
    "C": 1.0,                       // float: Regularization parameter for the model
    "epsilon": 1e-6,                // float: Convergence threshold for iterative algorithms
    "history": true,                // bool: Whether to keep a history of iterations
    "type_dist": null,              // null or str: Type of distance metric (e.g., "euclidean", "cosine")
    "quality_type": "adjusted_OOE", // str: Quality metric for evaluating partitions (e.g., "adjusted_OOE", "NMI")
    "amplitude": 5.0,               // float: Amplitude parameter for edge weight adjustments
    "update_factor": 0.1,           // float: Factor for updating weights during iterations
    "ddelta": 0.1,                  // float: Step size for parameter updates
    "iterator": false,              // bool: Whether to use an iterative approach
    "decompose": false,             // bool: Whether to decompose the graph into subgraphs
    "graph_path": "data/graph.gpickle", // str: Path to the input graph file (must be a .gpickle file)
    "name": "DatasetName",          // str: Name for the dataset (used in output file naming)
    "test type": "GM",              // str: Test type ("GM" for Fusion, "RL" for Record Linkage)
    "add_num": 100,                 // int: Number of false identity edges to add to the graph
    "remove_num": 100,              // int: Number of identity edges to remove from the graph
    "removal_chance": 0.2           // float: Probability of removing an edge during preprocessing
}

Key Notes:

graph_path: Ensure this points to a valid .gpickle file containing the graph data.
test type: Use "GM" for running the Fusion model or "RL" for Record Linkage tests.
Model Parameters: Parameters like blue_in, red_out, C, etc., directly affect the behavior of the Fusion model.
Execution Parameters: Parameters like add_num, remove_num, and removal_chance control preprocessing and execution behavior.

Main Git Repository

For further information, updates, and exemplary material, please refer to the main Git repository: GitHub Repository

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.0.1

Jul 23, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fusion_er-0.0.1.tar.gz (5.2 kB view details)

Uploaded Jul 23, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fusion_er-0.0.1-py3-none-any.whl (5.1 kB view details)

Uploaded Jul 23, 2025 Python 3

File details

Details for the file fusion_er-0.0.1.tar.gz.

File metadata

Download URL: fusion_er-0.0.1.tar.gz
Upload date: Jul 23, 2025
Size: 5.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for fusion_er-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`1ea64007d5968be717d1d90c82dd5cfcc61ba656fc9765f1856009dcc2a1ac0a`
MD5	`28498a250ac77a03b6acb111f1972687`
BLAKE2b-256	`c3a1cb204b275f98e38e2d1d64e2439fa25405929c1d50041f78f048e79adfb7`

See more details on using hashes here.

File details

Details for the file fusion_er-0.0.1-py3-none-any.whl.

File metadata

Download URL: fusion_er-0.0.1-py3-none-any.whl
Upload date: Jul 23, 2025
Size: 5.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for fusion_er-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`73723c771ffcae5d27c6fc548ab48f6e07953cd888ad4f10cc5931a035fa6127`
MD5	`54d175a20722c4dbb2ee7f758faeb6ca`
BLAKE2b-256	`561ef3caddb2384e16a1e30b2c63850427dc5b65a07583ce310306009a0364c7`

See more details on using hashes here.

Fusion-ER 0.0.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Fusion- Flexible Unification of Structured Intermodal Object Networks

Table of Contents

Installation

Quick Tour

Directory Tree

Evaluation

Saving Results:

Plotting Graphs

Configuration File Example

Key Notes:

Main Git Repository

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes