Skip to main content

Deterministic key and join discovery for structured datasets.

Project description

Stop guessing how your tables connect

License PyPI Python 3.11+ CI Code style: Ruff Formats: CSV, XLSX, JSON, Parquet


smartjoin helps you understand how unfamiliar datasets fit together — without schema docs, manual SQL detective work, or opaque guesses.

It scans structured data, profiles columns, discovers likely keys, infers candidate joins, and generates an interactive explorer so you can inspect the results.

Supports .csv, .xlsx, .json, .parquet input files.

Example

Given a folder like:

  • orders.csv
  • customers.xlsx
  • payments.parquet
  • shipments.json

smartjoin can infer relationships such as:

Source Target Type Confidence Origin
orders.customer_id customers.id many_to_one 98% Direct
payments.order_id orders.order_id many_to_one 95% Derived
shipments.order_ref orders.order_id one_to_one 89% Direct

Quickstart

Installation

pip install smartjoin-py

Run

smartjoin run <path> <out_dir>

This analyzes the structured datasets in <path> and writes results to <out_dir>.

Outputs

  • report.json — full structured analysis output
  • relationships.csv — flat table of discovered joins and scoring signals
  • explorer/index.html — interactive explorer UI
  • explorer/data.json — explorer payload

Generate demo datasets

To explore smartjoin on deterministic synthetic data:

smartjoin generate-test-datasets --output-dir <output-dir>

Limitations

smartjoin identifies candidate relationships across structured datasets. It does not guarantee semantic correctness.

Always review inferred joins before using them downstream. Domain meaning may still require human interpretation, and output quality depends on the structure and consistency of the input data.

Roadmap

Future development may include:

  • stronger semantic matching across columns and tables
  • optional AI-assisted reasoning and scoring
  • improved explorer and debugging capabilities
  • broader support for real-world edge cases and heterogeneous datasets

Contributing

See CONTRIBUTING.md.

License

Licensed under the MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smartjoin_py-0.1.2.tar.gz (8.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smartjoin_py-0.1.2-py3-none-any.whl (89.3 kB view details)

Uploaded Python 3

File details

Details for the file smartjoin_py-0.1.2.tar.gz.

File metadata

  • Download URL: smartjoin_py-0.1.2.tar.gz
  • Upload date:
  • Size: 8.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smartjoin_py-0.1.2.tar.gz
Algorithm Hash digest
SHA256 f5717c8d4c3247a89e3d6cf3a7aacc811125377964728e4d456959726bb152d9
MD5 aad19ad6303eda43af520fd0b4bccd36
BLAKE2b-256 638aecdbc57454db922d50c1eaa9eedc8f8b83330db4c3c0c9d4286acb53ea84

See more details on using hashes here.

Provenance

The following attestation bundles were made for smartjoin_py-0.1.2.tar.gz:

Publisher: release.yml on tbrus/smartjoin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smartjoin_py-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: smartjoin_py-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 89.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smartjoin_py-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 04917c4bd188cf0d9418aed2fbdfa46befb11d107cb2fa23bfe26036a9c41c30
MD5 793ddee66c84f8fd5280c32c2c620049
BLAKE2b-256 6f968668100c3f7ed728cbffcd6f337c2e5fa81fad6dc9bb6efa6bcfd49038df

See more details on using hashes here.

Provenance

The following attestation bundles were made for smartjoin_py-0.1.2-py3-none-any.whl:

Publisher: release.yml on tbrus/smartjoin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page