Deterministic relationship discovery for structured datasets.
Project description
smartjoin: data relationship discovery in seconds
Stop guessing how your tables connect - smartjoin automatically discovers relationships between structured datasets — no schema, no docs, no manual SQL detective work.
When working with unfamiliar datasets, one of the hardest problems is understanding how files relate to each other.
smartjoin helps by scanning structured datasets, identifying candidate relationships, producing explainable outputs instead of opaque guesses and giving you an explorer to inspect and review the results.
Quickstart
Installation
pip install smartjoin-py
Run
smartjoin run <path> <out_dir>
This analyzes the structured datasets in <path> and writes results to <out_dir>.
Generate test datasets
To explore how smartjoin works, you can generate synthethic test datasets:
smartjoin generate-test-datasets --output-dir <output-dir>
Explorer
In addition to the output files, smartjoin generates an interactive HTML-based explorer that helps you inspect detected relationships visually.
Limitations
smartjoin identifies candidate relationships across structured datasets. It does not guarantee semantic correctness.
Please keep in mind:
- inferred relationships should be reviewed before being relied on downstream
- domain-specific meaning may still require human interpretation
- output quality depends on the quality, consistency, and structure of the input data
- the tool is intended for structured dataset analysis, not as a general-purpose data processing platform
Currently supported input formats include: .csv, .xlsx, .json, .parquet.
Roadmap
Future development may include:
- stronger semantic matching across columns and tables
- optional AI-assisted reasoning and scoring
- improved explorer and debugging capabilities
- broader support for real-world edge cases and heterogeneous datasets
Contributing
See CONTRIBUTING.md.
License
Licensed under the MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file smartjoin_py-0.1.1.tar.gz.
File metadata
- Download URL: smartjoin_py-0.1.1.tar.gz
- Upload date:
- Size: 788.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ad89195c6105de46c964dfd793c7b39f148c1cd4a84120fe425d1e7032a005e2
|
|
| MD5 |
31fa6b8d1a6c3e50c27685be7974d672
|
|
| BLAKE2b-256 |
036b87397f0041bfd09e3d1a9882007eed54689c76bb1f317ddd15d7601e5b9d
|
Provenance
The following attestation bundles were made for smartjoin_py-0.1.1.tar.gz:
Publisher:
release.yml on tbrus/smartjoin
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
smartjoin_py-0.1.1.tar.gz -
Subject digest:
ad89195c6105de46c964dfd793c7b39f148c1cd4a84120fe425d1e7032a005e2 - Sigstore transparency entry: 1155036586
- Sigstore integration time:
-
Permalink:
tbrus/smartjoin@6e165a9862dcdb188e69edc57a82d726ac20ec9c -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/tbrus
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6e165a9862dcdb188e69edc57a82d726ac20ec9c -
Trigger Event:
release
-
Statement type:
File details
Details for the file smartjoin_py-0.1.1-py3-none-any.whl.
File metadata
- Download URL: smartjoin_py-0.1.1-py3-none-any.whl
- Upload date:
- Size: 88.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e166c18e4b2a510f6a009f2abfc06b44b713dc46137b8f341b2b4631bf45628
|
|
| MD5 |
2e03647d5919def04850b96c29fb6798
|
|
| BLAKE2b-256 |
d58c5af192c534950d35511064930f349570899d1c13f92fad43e3cb03e50fe5
|
Provenance
The following attestation bundles were made for smartjoin_py-0.1.1-py3-none-any.whl:
Publisher:
release.yml on tbrus/smartjoin
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
smartjoin_py-0.1.1-py3-none-any.whl -
Subject digest:
9e166c18e4b2a510f6a009f2abfc06b44b713dc46137b8f341b2b4631bf45628 - Sigstore transparency entry: 1155036589
- Sigstore integration time:
-
Permalink:
tbrus/smartjoin@6e165a9862dcdb188e69edc57a82d726ac20ec9c -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/tbrus
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6e165a9862dcdb188e69edc57a82d726ac20ec9c -
Trigger Event:
release
-
Statement type: