Deterministic key and join discovery for structured datasets.
Project description
Stop guessing how your tables connect
smartjoin helps you understand how unfamiliar datasets fit together — without schema docs, manual SQL detective work, or opaque guesses.
It scans structured data, profiles columns, discovers likely keys, infers candidate joins, and generates an interactive explorer so you can inspect the results.
Supports .csv, .xlsx, .json, .parquet input files.
Example
Given a folder like:
orders.csvcustomers.xlsxpayments.parquetshipments.json
smartjoin can infer relationships such as:
| Source | Target | Type | Confidence | Origin |
|---|---|---|---|---|
orders.customer_id |
customers.id |
many_to_one |
98% |
Direct |
payments.order_id |
orders.order_id |
many_to_one |
95% |
Derived |
shipments.order_ref |
orders.order_id |
one_to_one |
89% |
Direct |
Quickstart
Installation
pip install smartjoin-py
Run
smartjoin run <path> <out_dir>
This analyzes the structured datasets in <path> and writes results to <out_dir>.
Outputs
report.json— full structured analysis outputrelationships.csv— flat table of discovered joins and scoring signalsexplorer/index.html— interactive explorer UIexplorer/data.json— explorer payload
Generate demo datasets
To explore smartjoin on deterministic synthetic data:
smartjoin generate-test-datasets --output-dir <output-dir>
Limitations
smartjoin identifies candidate relationships across structured datasets. It does not guarantee semantic correctness.
Always review inferred joins before using them downstream. Domain meaning may still require human interpretation, and output quality depends on the structure and consistency of the input data.
Roadmap
Future development may include:
- stronger semantic matching across columns and tables
- optional AI-assisted reasoning and scoring
- improved explorer and debugging capabilities
- broader support for real-world edge cases and heterogeneous datasets
Contributing
See CONTRIBUTING.md.
License
Licensed under the MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file smartjoin_py-0.1.2.tar.gz.
File metadata
- Download URL: smartjoin_py-0.1.2.tar.gz
- Upload date:
- Size: 8.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5717c8d4c3247a89e3d6cf3a7aacc811125377964728e4d456959726bb152d9
|
|
| MD5 |
aad19ad6303eda43af520fd0b4bccd36
|
|
| BLAKE2b-256 |
638aecdbc57454db922d50c1eaa9eedc8f8b83330db4c3c0c9d4286acb53ea84
|
Provenance
The following attestation bundles were made for smartjoin_py-0.1.2.tar.gz:
Publisher:
release.yml on tbrus/smartjoin
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
smartjoin_py-0.1.2.tar.gz -
Subject digest:
f5717c8d4c3247a89e3d6cf3a7aacc811125377964728e4d456959726bb152d9 - Sigstore transparency entry: 1179730607
- Sigstore integration time:
-
Permalink:
tbrus/smartjoin@856fdb69264bbe1c9623b8179ab37e6c6898cb63 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/tbrus
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@856fdb69264bbe1c9623b8179ab37e6c6898cb63 -
Trigger Event:
release
-
Statement type:
File details
Details for the file smartjoin_py-0.1.2-py3-none-any.whl.
File metadata
- Download URL: smartjoin_py-0.1.2-py3-none-any.whl
- Upload date:
- Size: 89.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04917c4bd188cf0d9418aed2fbdfa46befb11d107cb2fa23bfe26036a9c41c30
|
|
| MD5 |
793ddee66c84f8fd5280c32c2c620049
|
|
| BLAKE2b-256 |
6f968668100c3f7ed728cbffcd6f337c2e5fa81fad6dc9bb6efa6bcfd49038df
|
Provenance
The following attestation bundles were made for smartjoin_py-0.1.2-py3-none-any.whl:
Publisher:
release.yml on tbrus/smartjoin
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
smartjoin_py-0.1.2-py3-none-any.whl -
Subject digest:
04917c4bd188cf0d9418aed2fbdfa46befb11d107cb2fa23bfe26036a9c41c30 - Sigstore transparency entry: 1179730654
- Sigstore integration time:
-
Permalink:
tbrus/smartjoin@856fdb69264bbe1c9623b8179ab37e6c6898cb63 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/tbrus
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@856fdb69264bbe1c9623b8179ab37e6c6898cb63 -
Trigger Event:
release
-
Statement type: