A utility to find potential join keys (matching columns) across multiple pandas DataFrames.
Project description
FindMyJoint
A Python utility to analyze and compare columns across multiple pandas DataFrames, suggesting potential join keys and visualizing the relationships.
When working with multiple disparate datasets, finding common columns to join them on is a tedious manual task. findmyjoint automates this by:
- Profiling each DataFrame's columns (dtype, uniqueness, nulls).
- Comparing all possible column pairs across datasets.
- Scoring pairs based on name similarity (using rapidfuzz) and content similarity (using Jaccard index).
- Suggesting join confidence levels.
- Visualizing the connections as an interactive network graph (using pyvis).
Installation
You will be able to install this via pip once it's published:
pip install findmyjoint
Quickstart
You can get a comparison matrix or an interactive graph with a single line of code.
1. Create toy datasets
df1 = pd.DataFrame({
'age': [21, 25, 30, 45],
'name': ['Alice', 'Bob', 'Charlie', 'David'],
'user_id': ['001', '002', '003', '004']
})
df2 = pd.DataFrame({
'Age': ['21', '25', '30', '45'],
'full_name': ['Alice', 'Bob', 'Charlie', 'David'],
'customer_id': [1, 2, 3, 4]
})
df3 = pd.DataFrame({
'client_identifier': ['001', '002', '003', '004'],
'location': ['USA', 'CAN', 'USA', 'MEX'],
'years_old': [21, 25, 30, 45]
})
datasets = [df1, df2, df3]
names = ['hr', 'crm', 'finance']
# 2. Get the comparison matrix
print("--- Comparison Matrix ---")
matrix = fmj.compare(datasets, names=names, name_threshold=0.6)
print(matrix.head())
# 3. Generate the interactive network graph
print("\n--- Generating Network Graph ---")
# This will create and automatically open 'joint_graph.html'
fmj.network(datasets, names=names, threshold=0.6)
print("Graph 'joint_graph.html' created.")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file findmyjoint-0.0.1.tar.gz.
File metadata
- Download URL: findmyjoint-0.0.1.tar.gz
- Upload date:
- Size: 12.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
934c35de577372e7ecde9a79887c9b0f670a3ab41f4fef725a7921ffa11c2de8
|
|
| MD5 |
1c9708d0781f808bbb79d11fc2eaa3c3
|
|
| BLAKE2b-256 |
502f1b165f0e406f7476be6a01abee5ed4510586090bbd39d50ee57874a26209
|
File details
Details for the file findmyjoint-0.0.1-py3-none-any.whl.
File metadata
- Download URL: findmyjoint-0.0.1-py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
274a487fcff92aba126a7a88672ace6d292562d9ce0cb03ea0eb8d7545c08184
|
|
| MD5 |
4f9667a5dc3d2cf13c9e3feca71fab8f
|
|
| BLAKE2b-256 |
792355996270f6be1f5e83bdf643f2e2664da1e2ecc15f5f6c0350a8c1246434
|